--- title: "Negative Control Diagnostics in causaldef" author: "Deniz Akdemir" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Negative Control Diagnostics in causaldef} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Introduction This vignette demonstrates how to use **negative control outcomes** to screen for residual confounding and compute a corresponding sensitivity bound using the `causaldef` package. Negative controls provide an empirical diagnostic for whether your adjustment strategy may have failed to remove confounding. ## Theoretical Background ### What is a Negative Control Outcome? A **negative control outcome** ($Y'$) is a variable that: 1. **Shares confounders with the true outcome $Y$** — it is affected by the same unmeasured variables $U$ that confound the treatment-outcome relationship 2. **Is NOT causally affected by treatment $A$** — the true causal effect of $A$ on $Y'$ is zero ### The Diagnostic Logic The key insight is: > If your adjustment strategy correctly removes confounding, then the > residual association between $A$ and $Y'$ should be zero. If you observe a non-zero association between $A$ and $Y'$ after adjustment, this indicates that **confounding remains** and your causal estimates may be biased. ### Negative Control Sensitivity Bound (manuscript `thm:nc_bound`) The `causaldef` package combines two ingredients: 1. a screening test for residual association between treatment and the negative control after adjustment, and 2. the manuscript's negative control sensitivity bound (`thm:nc_bound`): $$\delta(\hat{K}) \leq \kappa \cdot \delta_{NC}(\hat{K})$$ where: - $\delta(\hat{K})$ is the true deficiency (what we want to know) - $\delta_{NC}(\hat{K})$ is a negative-control association proxy (what we can measure) - $\kappa$ is an alignment constant reflecting how well $Y'$ proxies for $Y$'s confounding ## Practical Example ### Simulating Data with a Negative Control Let's create a dataset where we have: - An unmeasured confounder $U$ - An observed covariate $W$ (correlated with $U$) - Binary treatment $A$ - Outcome $Y$ affected by $A$ and $U$ - Negative control $Y'$ affected only by $U$ (not $A$) ```{r simulate-data} library(causaldef) set.seed(42) n <- 500 # Unmeasured confounder U <- rnorm(n) # Observed covariate (partially captures U) W <- 0.7 * U + rnorm(n, sd = 0.5) # Treatment assignment (confounded by U via W) ps_true <- plogis(0.3 + 0.8 * U) A <- rbinom(n, 1, ps_true) # True causal effect beta_true <- 2.0 # Outcome (affected by A and U) Y <- 1 + beta_true * A + 1.5 * U + rnorm(n) # Negative control outcome (affected by U only, NOT by A) Y_nc <- 0.5 + 1.2 * U + rnorm(n, sd = 0.8) # Create data frame df <- data.frame(W = W, A = A, Y = Y, Y_nc = Y_nc) ``` ### Creating the Causal Specification We specify the causal problem including the negative control: ```{r create-spec} spec <- causal_spec( data = df, treatment = "A", outcome = "Y", covariates = "W", negative_control = "Y_nc" ) print(spec) ``` ### Running the Negative Control Diagnostic Now we test whether our IPTW adjustment successfully removes confounding: ```{r nc-diagnostic, eval=FALSE} nc_result <- nc_diagnostic( spec, method = "iptw", alpha = 0.05, n_boot = 200 ) print(nc_result) ``` ### Interpreting the Results The diagnostic returns: - **`screening$statistic`**: Weighted residual association between $A$ and $Y'$ after adjustment - **`p_value`**: Permutation p-value for that residual association - **`delta_nc`**: The observed negative-control association proxy - **`delta_bound`**: Upper bound on true deficiency ($\kappa \times \delta_{NC}$) - **`falsified`**: Whether the residual-association screening test rejects ### Scenarios #### Scenario 1: Adjustment Succeeds If $W$ fully captures $U$, the negative control test will NOT falsify: ```{r scenario-success, eval=FALSE} # When W = U (no unmeasured confounding) df_full <- df df_full$W <- U # Perfect proxy spec_full <- causal_spec( df_full, "A", "Y", "W", negative_control = "Y_nc" ) nc_full <- nc_diagnostic(spec_full, method = "iptw", n_boot = 100) print(nc_full) # Expect: falsified = FALSE ``` #### Scenario 2: Adjustment Fails When $W$ is a poor proxy for $U$, falsification occurs: ```{r scenario-fail, eval=FALSE} # When W is noise (no information about U) df_bad <- df df_bad$W <- rnorm(n) # Useless proxy spec_bad <- causal_spec( df_bad, "A", "Y", "W", negative_control = "Y_nc" ) nc_bad <- nc_diagnostic(spec_bad, method = "iptw", n_boot = 100) print(nc_bad) # Expect: falsified = TRUE ``` ## Choosing Good Negative Control Outcomes ### Ideal Properties The best negative control outcomes have: 1. **Strong confounding alignment**: $Y'$ shares the same unmeasured confounders as $Y$ 2. **Zero treatment effect**: No plausible mechanism by which $A$ affects $Y'$ 3. **Measurable**: Available in your dataset ### Examples by Domain | Domain | Treatment | Outcome | Possible Negative Control | |--------|-----------|---------|---------------------------| | Cardiovascular | Statin use | CVD events | Accidental injuries | | Oncology | Chemotherapy | Tumor response | Hospital-acquired infections | | Economics | Job training | Earnings in 1978 | Earnings in 1974 (pre-treatment) | | Epidemiology | Vaccination | Flu incidence | Unrelated disease incidence | ## Combining with Deficiency Estimation The negative control diagnostic complements deficiency estimation: ```{r combined-workflow, eval=FALSE} # Step 1: Estimate deficiency def_results <- estimate_deficiency( spec, methods = c("unadjusted", "iptw", "aipw"), n_boot = 100 ) print(def_results) # Step 2: Run negative control diagnostic on best method best_method <- names(which.min(def_results$estimates)) nc_check <- nc_diagnostic(spec, method = best_method, n_boot = 100) # Step 3: Compute policy bounds if assumptions not falsified if (!nc_check$falsified) { bounds <- policy_regret_bound( def_results, utility_range = c(-5, 10), method = best_method ) print(bounds) } else { warning("Causal assumptions falsified. Consider additional covariates.") } ``` ## Advanced: Estimating Kappa The alignment constant $\kappa$ affects the bound's tightness. The default $\kappa = 1$ is conservative. You can estimate $\kappa$ from domain knowledge: ```{r kappa-estimation, eval=FALSE} # If you believe Y' has 80% of Y's confounding structure: nc_tight <- nc_diagnostic( spec, method = "iptw", kappa = 0.8, n_boot = 100 ) print(nc_tight) ``` ## Summary | Function | Purpose | |----------|---------| | `nc_diagnostic()` | Screen for residual association and compute a sensitivity bound | | `delta_nc` | Observable negative-control association proxy | | `delta_bound` | Upper bound on true deficiency | | `falsified` | Screening rejection of residual association | Negative control diagnostics provide a **data-driven** way to assess causal assumptions. Use them alongside deficiency estimation for robust causal inference. ## References 1. Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison. DOI: 10.5281/zenodo.18367347. See `thm:nc_bound` (Negative Control Sensitivity Bound). 2. Lipsitch, M., Tchetgen, E., & Cohen, T. (2010). Negative controls: A tool for detecting confounding and bias. *Epidemiology*, 21(3), 383-388. 3. Shi, X., Miao, W., & Tchetgen Tchetgen, E. (2020). A selective review of negative control methods. *Current Epidemiology Reports*, 7, 190-202.