--- title: "Introduction to confoundvis" author: "Subir Hait" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to confoundvis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) library(confoundvis) ``` ## Overview `confoundvis` provides a unified, ggplot2-native visualization toolkit for causal sensitivity analysis under unmeasured confounding. It supports multiple frameworks — the Impact Threshold for a Confounding Variable (ITCV; Frank 2000), partial R²-based methods (Cinelli & Hazlett 2020), and the multilevel mITCV extension — and introduces the **Sensitivity Love Plot**, a novel diagnostic that benchmarks sensitivity thresholds against observed covariate impacts in a format familiar from propensity score analysis. ## The `confoundsens` object All plotting functions in `confoundvis` accept a `confoundsens` object, which stores a sensitivity path $\theta(\lambda)$ along with optional standard errors, test statistics, and level labels (for multilevel decompositions). ```{r create-object} x <- new_confoundsens( lambda = seq(0, 0.25, length.out = 20), theta = 0.42 - 1.8 * seq(0, 0.25, length.out = 20), se = rep(0.07, 20) ) x ``` You can also construct a `confoundsens` object from an existing data frame: ```{r from-dataframe} df <- data.frame( lambda = seq(0, 0.25, length.out = 20), theta = 0.42 - 1.8 * seq(0, 0.25, length.out = 20), se = rep(0.07, 20), level = rep(c("within", "between"), each = 10) ) x_ml <- as_confoundsens(df, level = "level") summary(x_ml) ``` ## Robustness curve `plot_robustness_curve()` visualizes the full sensitivity path $\theta(\lambda)$ as confounding strength $\lambda$ grows from zero. This is the functional profile behind the scalar ITCV threshold — it shows not just *where* the effect crosses zero, but *how quickly*. ```{r robustness-curve} plot_robustness_curve(x, bands = TRUE, points = FALSE) + ggplot2::geom_hline(yintercept = 0, linetype = "dotted") + ggplot2::labs(title = "Robustness curve: effect path under growing confounding") ``` For multilevel data, set `facet_level = TRUE` to see within- and between-cluster paths side by side: ```{r robustness-curve-ml} x_ml2 <- new_confoundsens( lambda = rep(seq(0, 0.25, length.out = 15), 2), theta = c(0.50 - 2.0 * seq(0, 0.25, length.out = 15), 0.35 - 1.2 * seq(0, 0.25, length.out = 15)), se = rep(0.06, 30), level = rep(c("within", "between"), each = 15) ) plot_robustness_curve(x_ml2, facet_level = TRUE, bands = TRUE, points = FALSE) + ggplot2::labs(title = "Multilevel robustness curves") ``` ## Sensitivity contour plot `plot_sensitivity_contour()` draws the ITCV hyperbolic boundary in $(r_{YU}, r_{DU})$ space. Observed covariate benchmarks can be overlaid as labeled points, allowing readers to compare the threshold against the empirical distribution of measured covariate correlations. ```{r contour-plot} benchmarks <- data.frame( r_yu = c(0.12, 0.09, 0.18, 0.05), r_du = c(0.21, 0.14, 0.08, 0.31), label = c("SES", "Gender", "Race", "Prior achievement") ) plot_sensitivity_contour(threshold = 0.025, benchmarks = benchmarks) + ggplot2::labs( title = "Sensitivity contour plot", subtitle = "Benchmarks: observed covariate correlations" ) ``` ## Sensitivity Love plot The **Sensitivity Love Plot** is the key novel contribution of `confoundvis`. It adapts the Love plot — widely used to assess covariate balance in propensity score analysis — to the sensitivity analysis context. Each observed covariate appears as a point on a horizontal impact axis. The vertical dashed line marks the sensitivity threshold: covariates to its left are weaker than the threshold, while those to its right represent the strength an unmeasured confounder would need to exceed to overturn inference. ```{r love-plot} covariate_impacts <- data.frame( covariate = c("SES", "Gender", "Race", "Prior achievement", "School type", "Region", "Age"), impact = c(0.025, 0.008, 0.019, 0.041, 0.011, 0.006, 0.014) ) plot_sensitivity_love(covariate_impacts, threshold = 0.022) + ggplot2::labs( title = "Sensitivity Love plot", subtitle = "Impact of observed covariates vs. sensitivity threshold" ) ``` Covariates to the right of the dashed line (here, SES and Prior achievement) are stronger than the threshold, indicating that an unmeasured confounder of similar strength could overturn the conclusion. ## Local Taylor diagnostic `plot_local_taylor()` visualizes the local quadratic approximation of a sensitivity path. The observed path, the tangent (first-order) approximation, and the quadratic (second-order) approximation are distinguished by color and linetype. ```{r taylor-plot} d <- seq(0, 0.20, length.out = 40) taylor_df <- data.frame( delta = rep(d, 3), series = rep(c("Observed path", "Tangent (1st order)", "Quadratic (2nd order)"), each = 40), value = c( 0.4 - 0.7*d - 0.6*d^2, # concave-down path 0.4 - 0.7*d, # tangent 0.4 - 0.7*d - 0.3*d^2 # local quadratic fit ) ) plot_local_taylor(taylor_df) + ggplot2::geom_hline(yintercept = 0, linetype = "dotted") + ggplot2::labs( title = "Local Taylor diagnostic", subtitle = "Concave-down path: tangent overestimates robustness" ) ``` The key insight is that when the path is concave-down, the tangent *overestimates* how far the effect is from zero — the first-order threshold is optimistic. The local quadratic corrects this bias. ## Fitting a local quadratic `fit_local_quadratic()` estimates the quadratic coefficients directly from numerical sensitivity path data: ```{r fit-quadratic} path_df <- data.frame( delta = seq(0, 0.5, length.out = 100), theta = 0.4 - 0.7 * seq(0, 0.5, length.out = 100) - 0.5 * seq(0, 0.5, length.out = 100)^2 ) fit <- fit_local_quadratic(path_df, local_max_delta = 0.2) cat("Intercept:", round(fit$intercept, 4), "\n") cat("Slope :", round(fit$slope, 4), "\n") cat("Curvature:", round(fit$quad, 4), "\n") ``` ## Simulating curvature regimes `simulate_taylor_demo()` generates three synthetic paths with identical baseline and slope but different curvature, useful for teaching and for illustrating why first-order thresholds can mislead: ```{r simulate} sims <- simulate_taylor_demo(delta_max = 1, step = 0.05, theta0 = 0.4, slope = -0.5, kappa = 0.4) # Plot all three on one figure library(ggplot2) paths <- rbind( data.frame(delta = sims$linear$lambda, theta = sims$linear$theta, regime = "Linear"), data.frame(delta = sims$concave$lambda, theta = sims$concave$theta, regime = "Concave-down"), data.frame(delta = sims$convex$lambda, theta = sims$convex$theta, regime = "Convex-up") ) ggplot(paths, aes(x = delta, y = theta, colour = regime, linetype = regime)) + geom_line(linewidth = 0.8) + geom_hline(yintercept = 0, linetype = "dotted") + labs(x = expression(delta), y = expression(theta(delta)), title = "Three curvature regimes", colour = "Regime", linetype = "Regime") ``` ## References Frank, K. A. (2000). Impact of a confounding variable on a regression coefficient. *Sociological Methods & Research*, 29(2), 147–194. Cinelli, C., & Hazlett, C. (2020). Making sense of sensitivity: Extending omitted variable bias. *Journal of the Royal Statistical Society: Series B*, 82(1), 39–67. Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting. *Statistics in Medicine*, 34(28), 3661–3679.