---
title: "Introduction to confoundvis"
author: "Subir Hait"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to confoundvis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 6,
  fig.height = 4
)
library(confoundvis)
```

## Overview

`confoundvis` provides a unified, ggplot2-native visualization toolkit for
causal sensitivity analysis under unmeasured confounding. It supports multiple
frameworks — the Impact Threshold for a Confounding Variable (ITCV; Frank 2000),
partial R²-based methods (Cinelli & Hazlett 2020), and the multilevel mITCV
extension — and introduces the **Sensitivity Love Plot**, a novel diagnostic
that benchmarks sensitivity thresholds against observed covariate impacts in
a format familiar from propensity score analysis.

## The `confoundsens` object

All plotting functions in `confoundvis` accept a `confoundsens` object, which
stores a sensitivity path $\theta(\lambda)$ along with optional standard errors,
test statistics, and level labels (for multilevel decompositions).

```{r create-object}
x <- new_confoundsens(
  lambda = seq(0, 0.25, length.out = 20),
  theta  = 0.42 - 1.8 * seq(0, 0.25, length.out = 20),
  se     = rep(0.07, 20)
)
x
```

You can also construct a `confoundsens` object from an existing data frame:

```{r from-dataframe}
df <- data.frame(
  lambda = seq(0, 0.25, length.out = 20),
  theta  = 0.42 - 1.8 * seq(0, 0.25, length.out = 20),
  se     = rep(0.07, 20),
  level  = rep(c("within", "between"), each = 10)
)
x_ml <- as_confoundsens(df, level = "level")
summary(x_ml)
```

## Robustness curve

`plot_robustness_curve()` visualizes the full sensitivity path $\theta(\lambda)$
as confounding strength $\lambda$ grows from zero. This is the functional profile
behind the scalar ITCV threshold — it shows not just *where* the effect crosses
zero, but *how quickly*.

```{r robustness-curve}
plot_robustness_curve(x, bands = TRUE, points = FALSE) +
  ggplot2::geom_hline(yintercept = 0, linetype = "dotted") +
  ggplot2::labs(title = "Robustness curve: effect path under growing confounding")
```

For multilevel data, set `facet_level = TRUE` to see within- and between-cluster
paths side by side:

```{r robustness-curve-ml}
x_ml2 <- new_confoundsens(
  lambda = rep(seq(0, 0.25, length.out = 15), 2),
  theta  = c(0.50 - 2.0 * seq(0, 0.25, length.out = 15),
             0.35 - 1.2 * seq(0, 0.25, length.out = 15)),
  se     = rep(0.06, 30),
  level  = rep(c("within", "between"), each = 15)
)
plot_robustness_curve(x_ml2, facet_level = TRUE, bands = TRUE, points = FALSE) +
  ggplot2::labs(title = "Multilevel robustness curves")
```

## Sensitivity contour plot

`plot_sensitivity_contour()` draws the ITCV hyperbolic boundary in
$(r_{YU}, r_{DU})$ space. Observed covariate benchmarks can be overlaid as
labeled points, allowing readers to compare the threshold against the empirical
distribution of measured covariate correlations.

```{r contour-plot}
benchmarks <- data.frame(
  r_yu  = c(0.12, 0.09, 0.18, 0.05),
  r_du  = c(0.21, 0.14, 0.08, 0.31),
  label = c("SES", "Gender", "Race", "Prior achievement")
)
plot_sensitivity_contour(threshold = 0.025, benchmarks = benchmarks) +
  ggplot2::labs(
    title   = "Sensitivity contour plot",
    subtitle = "Benchmarks: observed covariate correlations"
  )
```

## Sensitivity Love plot

The **Sensitivity Love Plot** is the key novel contribution of `confoundvis`.
It adapts the Love plot — widely used to assess covariate balance in propensity
score analysis — to the sensitivity analysis context. Each observed covariate
appears as a point on a horizontal impact axis. The vertical dashed line marks
the sensitivity threshold: covariates to its left are weaker than the threshold,
while those to its right represent the strength an unmeasured confounder would
need to exceed to overturn inference.

```{r love-plot}
covariate_impacts <- data.frame(
  covariate = c("SES", "Gender", "Race", "Prior achievement",
                "School type", "Region", "Age"),
  impact    = c(0.025, 0.008, 0.019, 0.041, 0.011, 0.006, 0.014)
)

plot_sensitivity_love(covariate_impacts, threshold = 0.022) +
  ggplot2::labs(
    title    = "Sensitivity Love plot",
    subtitle = "Impact of observed covariates vs. sensitivity threshold"
  )
```

Covariates to the right of the dashed line (here, SES and Prior achievement)
are stronger than the threshold, indicating that an unmeasured confounder of
similar strength could overturn the conclusion.

## Local Taylor diagnostic

`plot_local_taylor()` visualizes the local quadratic approximation of a
sensitivity path. The observed path, the tangent (first-order) approximation,
and the quadratic (second-order) approximation are distinguished by color and
linetype.

```{r taylor-plot}
d <- seq(0, 0.20, length.out = 40)
taylor_df <- data.frame(
  delta  = rep(d, 3),
  series = rep(c("Observed path", "Tangent (1st order)",
                  "Quadratic (2nd order)"), each = 40),
  value  = c(
    0.4 - 0.7*d - 0.6*d^2,   # concave-down path
    0.4 - 0.7*d,              # tangent
    0.4 - 0.7*d - 0.3*d^2    # local quadratic fit
  )
)

plot_local_taylor(taylor_df) +
  ggplot2::geom_hline(yintercept = 0, linetype = "dotted") +
  ggplot2::labs(
    title    = "Local Taylor diagnostic",
    subtitle = "Concave-down path: tangent overestimates robustness"
  )
```

The key insight is that when the path is concave-down, the tangent
*overestimates* how far the effect is from zero — the first-order threshold
is optimistic. The local quadratic corrects this bias.

## Fitting a local quadratic

`fit_local_quadratic()` estimates the quadratic coefficients directly from
numerical sensitivity path data:

```{r fit-quadratic}
path_df <- data.frame(
  delta = seq(0, 0.5, length.out = 100),
  theta = 0.4 - 0.7 * seq(0, 0.5, length.out = 100) -
          0.5 * seq(0, 0.5, length.out = 100)^2
)

fit <- fit_local_quadratic(path_df, local_max_delta = 0.2)
cat("Intercept:", round(fit$intercept, 4), "\n")
cat("Slope    :", round(fit$slope,     4), "\n")
cat("Curvature:", round(fit$quad,      4), "\n")
```

## Simulating curvature regimes

`simulate_taylor_demo()` generates three synthetic paths with identical
baseline and slope but different curvature, useful for teaching and for
illustrating why first-order thresholds can mislead:

```{r simulate}
sims <- simulate_taylor_demo(delta_max = 1, step = 0.05,
                             theta0 = 0.4, slope = -0.5, kappa = 0.4)

# Plot all three on one figure
library(ggplot2)
paths <- rbind(
  data.frame(delta = sims$linear$lambda,
             theta = sims$linear$theta,  regime = "Linear"),
  data.frame(delta = sims$concave$lambda,
             theta = sims$concave$theta, regime = "Concave-down"),
  data.frame(delta = sims$convex$lambda,
             theta = sims$convex$theta,  regime = "Convex-up")
)

ggplot(paths, aes(x = delta, y = theta, colour = regime, linetype = regime)) +
  geom_line(linewidth = 0.8) +
  geom_hline(yintercept = 0, linetype = "dotted") +
  labs(x = expression(delta), y = expression(theta(delta)),
       title   = "Three curvature regimes",
       colour  = "Regime", linetype = "Regime")
```

## References

Frank, K. A. (2000). Impact of a confounding variable on a regression
coefficient. *Sociological Methods & Research*, 29(2), 147–194.

Cinelli, C., & Hazlett, C. (2020). Making sense of sensitivity: Extending
omitted variable bias. *Journal of the Royal Statistical Society: Series B*,
82(1), 39–67.

Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using
inverse probability of treatment weighting. *Statistics in Medicine*, 34(28),
3661–3679.