---
title: "Negative Control Diagnostics in causaldef"
author: "Deniz Akdemir"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Negative Control Diagnostics in causaldef}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

This vignette demonstrates how to use **negative control outcomes** to screen
for residual confounding and compute a corresponding sensitivity bound using
the `causaldef` package. Negative controls provide an empirical diagnostic for
whether your adjustment strategy may have failed to remove confounding.

## Theoretical Background

### What is a Negative Control Outcome?

A **negative control outcome** ($Y'$) is a variable that:

1. **Shares confounders with the true outcome $Y$** — it is affected by the same 
   unmeasured variables $U$ that confound the treatment-outcome relationship
2. **Is NOT causally affected by treatment $A$** — the true causal effect of 
   $A$ on $Y'$ is zero

### The Diagnostic Logic

The key insight is:

> If your adjustment strategy correctly removes confounding, then the 
> residual association between $A$ and $Y'$ should be zero.

If you observe a non-zero association between $A$ and $Y'$ after adjustment, 
this indicates that **confounding remains** and your causal estimates may be biased.

### Negative Control Sensitivity Bound (manuscript `thm:nc_bound`)

The `causaldef` package combines two ingredients:

1. a screening test for residual association between treatment and the negative control after adjustment, and
2. the manuscript's negative control sensitivity bound (`thm:nc_bound`):

$$\delta(\hat{K}) \leq \kappa \cdot \delta_{NC}(\hat{K})$$

where:
- $\delta(\hat{K})$ is the true deficiency (what we want to know)
- $\delta_{NC}(\hat{K})$ is a negative-control association proxy (what we can measure)
- $\kappa$ is an alignment constant reflecting how well $Y'$ proxies for $Y$'s confounding

## Practical Example

### Simulating Data with a Negative Control

Let's create a dataset where we have:
- An unmeasured confounder $U$
- An observed covariate $W$ (correlated with $U$)
- Binary treatment $A$
- Outcome $Y$ affected by $A$ and $U$
- Negative control $Y'$ affected only by $U$ (not $A$)

```{r simulate-data}
library(causaldef)
set.seed(42)

n <- 500

# Unmeasured confounder
U <- rnorm(n)

# Observed covariate (partially captures U)
W <- 0.7 * U + rnorm(n, sd = 0.5)

# Treatment assignment (confounded by U via W)
ps_true <- plogis(0.3 + 0.8 * U)
A <- rbinom(n, 1, ps_true)

# True causal effect
beta_true <- 2.0

# Outcome (affected by A and U)
Y <- 1 + beta_true * A + 1.5 * U + rnorm(n)

# Negative control outcome (affected by U only, NOT by A)
Y_nc <- 0.5 + 1.2 * U + rnorm(n, sd = 0.8)

# Create data frame
df <- data.frame(W = W, A = A, Y = Y, Y_nc = Y_nc)
```

### Creating the Causal Specification

We specify the causal problem including the negative control:

```{r create-spec}
spec <- causal_spec(
  data = df,
  treatment = "A",
  outcome = "Y",
  covariates = "W",
  negative_control = "Y_nc"
)

print(spec)
```

### Running the Negative Control Diagnostic

Now we test whether our IPTW adjustment successfully removes confounding:
```{r nc-diagnostic, eval=FALSE}
nc_result <- nc_diagnostic(
  spec,
  method = "iptw",
  alpha = 0.05,
  n_boot = 200
)

print(nc_result)
```

### Interpreting the Results

The diagnostic returns:

- **`screening$statistic`**: Weighted residual association between $A$ and $Y'$ after adjustment
- **`p_value`**: Permutation p-value for that residual association
- **`delta_nc`**: The observed negative-control association proxy
- **`delta_bound`**: Upper bound on true deficiency ($\kappa \times \delta_{NC}$)
- **`falsified`**: Whether the residual-association screening test rejects

### Scenarios

#### Scenario 1: Adjustment Succeeds

If $W$ fully captures $U$, the negative control test will NOT falsify:

```{r scenario-success, eval=FALSE}
# When W = U (no unmeasured confounding)
df_full <- df
df_full$W <- U  # Perfect proxy

spec_full <- causal_spec(
  df_full, "A", "Y", "W", negative_control = "Y_nc"
)

nc_full <- nc_diagnostic(spec_full, method = "iptw", n_boot = 100)
print(nc_full)
# Expect: falsified = FALSE
```

#### Scenario 2: Adjustment Fails

When $W$ is a poor proxy for $U$, falsification occurs:

```{r scenario-fail, eval=FALSE}
# When W is noise (no information about U)
df_bad <- df
df_bad$W <- rnorm(n)  # Useless proxy

spec_bad <- causal_spec(
  df_bad, "A", "Y", "W", negative_control = "Y_nc"
)

nc_bad <- nc_diagnostic(spec_bad, method = "iptw", n_boot = 100)
print(nc_bad)
# Expect: falsified = TRUE
```

## Choosing Good Negative Control Outcomes

### Ideal Properties

The best negative control outcomes have:

1. **Strong confounding alignment**: $Y'$ shares the same unmeasured confounders 
   as $Y$
2. **Zero treatment effect**: No plausible mechanism by which $A$ affects $Y'$
3. **Measurable**: Available in your dataset

### Examples by Domain

| Domain | Treatment | Outcome | Possible Negative Control |
|--------|-----------|---------|---------------------------|
| Cardiovascular | Statin use | CVD events | Accidental injuries |
| Oncology | Chemotherapy | Tumor response | Hospital-acquired infections |
| Economics | Job training | Earnings in 1978 | Earnings in 1974 (pre-treatment) |
| Epidemiology | Vaccination | Flu incidence | Unrelated disease incidence |

## Combining with Deficiency Estimation

The negative control diagnostic complements deficiency estimation:

```{r combined-workflow, eval=FALSE}
# Step 1: Estimate deficiency
def_results <- estimate_deficiency(
  spec,
  methods = c("unadjusted", "iptw", "aipw"),
  n_boot = 100
)

print(def_results)

# Step 2: Run negative control diagnostic on best method
best_method <- names(which.min(def_results$estimates))
nc_check <- nc_diagnostic(spec, method = best_method, n_boot = 100)

# Step 3: Compute policy bounds if assumptions not falsified
if (!nc_check$falsified) {
  bounds <- policy_regret_bound(
    def_results,
    utility_range = c(-5, 10),
    method = best_method
  )
  print(bounds)
} else {
  warning("Causal assumptions falsified. Consider additional covariates.")
}
```

## Advanced: Estimating Kappa

The alignment constant $\kappa$ affects the bound's tightness. The default 
$\kappa = 1$ is conservative. You can estimate $\kappa$ from domain knowledge:

```{r kappa-estimation, eval=FALSE}
# If you believe Y' has 80% of Y's confounding structure:
nc_tight <- nc_diagnostic(
  spec,
  method = "iptw",
  kappa = 0.8,
  n_boot = 100
)

print(nc_tight)
```

## Summary

| Function | Purpose |
|----------|---------|
| `nc_diagnostic()` | Screen for residual association and compute a sensitivity bound |
| `delta_nc` | Observable negative-control association proxy |
| `delta_bound` | Upper bound on true deficiency |
| `falsified` | Screening rejection of residual association |

Negative control diagnostics provide a **data-driven** way to assess causal 
assumptions. Use them alongside deficiency estimation for robust causal inference.

## References

1. Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison. 
   DOI: 10.5281/zenodo.18367347. See `thm:nc_bound` (Negative Control Sensitivity Bound).

2. Lipsitch, M., Tchetgen, E., & Cohen, T. (2010). Negative controls: A tool 
   for detecting confounding and bias. *Epidemiology*, 21(3), 383-388.

3. Shi, X., Miao, W., & Tchetgen Tchetgen, E. (2020). A selective review of 
   negative control methods. *Current Epidemiology Reports*, 7, 190-202.