---
title: "Doubly Robust MAIC for HTA: A Complete Worked Example in Advanced NSCLC"
author: "drMAIC Package Authors"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Doubly Robust MAIC for HTA: A Complete Worked Example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
#bibliography: references.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 7,
  fig.height = 5,
  warning  = FALSE,
  message  = FALSE
)
```

# Introduction

## Background

In health technology appraisal (HTA), regulators and payers frequently require
indirect treatment comparisons (ITC) when no head-to-head randomised controlled
trial (RCT) exists between treatments of interest. A common scenario — particularly
in oncology — is the **unanchored indirect comparison**: two single-arm trials, each
testing a different treatment, with no common comparator arm.

Matching-Adjusted Indirect Comparison (MAIC) [@signorovitch2010] addresses this
by reweighting the individual patient data (IPD) from one trial to match the
aggregate baseline characteristics of the comparator. However, standard MAIC
depends entirely on the weighting model being correctly specified.

The **Doubly Robust MAIC (DR-MAIC)** implemented in this package combines:
1. **Inverse probability weighting** (standard MAIC)
2. **Outcome regression** (Standardised Treatment Comparison, STC / g-computation)

...into a single estimator that is consistent if **either** component is correctly
specified [@remiroazocar2022; @lunceford2004; @tan2010].

## Package scope

The `drMAIC` package is aligned with:

- **NICE DSU Technical Support Document 18** [@phillippo2016]
- **Cochrane Handbook Chapter 23** (Dias et al.)
- **ISPOR Task Force** guidance on indirect comparisons
- **Remiro-Azócar et al. (2022)** DR estimation framework

---

# Statistical Background

## Standard MAIC

Given IPD from study A with covariates $X_i$ and outcomes $Y_i$, and target
aggregate statistics $\bar{X}_B$ from study B, MAIC weights are:

$$w_i = \exp(X_i^\top \hat\lambda)$$

where $\hat\lambda$ solves:
$$\sum_{i=1}^n w_i X_i = n \bar{X}_B \quad (\text{moment-matching conditions})$$

The **Effective Sample Size (ESS)** quantifies information loss from reweighting:
$$ESS = \frac{(\sum_i w_i)^2}{\sum_i w_i^2}$$

Low ESS (< 30% of $n$) indicates limited population overlap and is a key
validity concern per NICE TSD 18.

## Doubly Robust Estimator

The DR-MAIC estimator:

$$\hat\theta_{DR} = \underbrace{\sum_i \omega_i \hat m(X_i)}_{\text{STC (g-computation)}} + \underbrace{\sum_i \omega_i \left(Y_i - \hat m(X_i)\right)}_{\text{IPW bias correction}}$$

where $\omega_i = w_i / \sum w_i$ are normalized weights and $\hat m(X_i)$ is
the predicted outcome from an outcome regression model. 

**Double robustness:** The estimator is consistent if either:
- The weights correctly balance $X$ between populations (even if $\hat m$ is wrong), **or**
- The outcome model $\hat m$ is correctly specified (even if weights are imperfect)

---

# Worked Example: Advanced NSCLC

## Data

```{r load-data}
library(drMAIC)

data(nsclc_ipd)
data(nsclc_agd)

# IPD from Study A (index trial — immunotherapy)
cat("=== Study A: IPD Summary ===\n")
cat(sprintf("n = %d patients\n", nrow(nsclc_ipd)))
cat(sprintf("Response rate: %.1f%%\n", 100 * mean(nsclc_ipd$response)))
cat(sprintf("Mean age: %.1f years\n", mean(nsclc_ipd$age)))
cat(sprintf("%% ECOG 1/2: %.1f%%\n", 100 * mean(nsclc_ipd$ecog)))
cat(sprintf("%% Ever-smoker: %.1f%%\n", 100 * mean(nsclc_ipd$smoker)))

# AgD from Study B (comparator trial)
cat("\n=== Study B: AgD ===\n")
cat(sprintf("n = %d patients\n", nsclc_agd$n_agd))
cat(sprintf("Response rate: %.1f%%\n", 100 * nsclc_agd$response_rate))
cat(sprintf("Mean age: %.1f years\n", nsclc_agd$mean_age))
cat(sprintf("%% ECOG 1/2: %.1f%%\n", 100 * nsclc_agd$prop_ecog1))
cat(sprintf("%% Ever-smoker: %.1f%%\n", 100 * nsclc_agd$prop_smoker))
```

Notice that Study B has an older, sicker population — this **population
imbalance** is exactly what MAIC corrects for.

## Step 1: Compute MAIC Weights

```{r compute-weights}
# Define target moments from Study B
target_moments <- c(
  age    = nsclc_agd$mean_age,
  ecog   = nsclc_agd$prop_ecog1,
  smoker = nsclc_agd$prop_smoker
)

# Compute entropy-balancing weights
w <- compute_weights(
  ipd            = nsclc_ipd,
  target_moments = target_moments,
  match_vars     = c("age", "ecog", "smoker"),
  verbose        = TRUE
)
```

## Step 2: Covariate Balance Diagnostics

```{r diagnostics, fig.cap="Love plot: covariate balance before and after MAIC weighting"}
diag <- maic_diagnostics(w, plot_type = "all")
```

```{r love-plot, fig.cap="Love Plot — covariate balance"}
diag$love_plot
```

```{r weight-plot, fig.cap="Weight distribution"}
diag$weight_plot
```

The Love plot shows that all covariates achieve |SMD| < 0.10 after weighting
(the NICE TSD 18 recommended threshold), confirming successful covariate balance.

## Step 3: Check Assumptions

```{r check-assumptions}
check_assumptions(w, ess_threshold = 30, smd_threshold = 0.10)
```

## Step 4: DR-MAIC Estimation

```{r dr-maic}
result <- dr_maic(
  maic_weights        = w,
  outcome_var         = "response",
  outcome_type        = "binary",
  comparator_estimate = nsclc_agd$response_rate,
  comparator_se       = nsclc_agd$response_se,
  effect_measure      = "OR"
)

print(result)
```

### Interpreting the three estimators

| Estimator | Description | Robust to |
|-----------|-------------|-----------|
| **MAIC (IPW)** | Re-weighted outcome mean | Outcome model misspecification |
| **STC (g-comp)** | Outcome model prediction | Weight misspecification |
| **DR-MAIC** | Augmented combination | Misspecification of **either** component |

The DR augmentation term (the difference between DR and STC) quantifies the
residual imbalance not captured by the outcome model — ideally close to zero.

## Step 5: Bootstrap Confidence Intervals

```{r bootstrap, eval=FALSE}
# Run 1000 bootstrap replicates (BCa method recommended by NICE TSD 18)
boot_res <- bootstrap_ci(
  dr_maic_result = result,
  R              = 1000,
  ci_type        = "bca",
  seed           = 2024
)
print(boot_res)
boot_res$boot_plot
```

```{r bootstrap-demo, echo=FALSE}
# Demonstration with fewer replicates for vignette build speed
boot_res <- bootstrap_ci(
  dr_maic_result = result,
  R              = 200,
  ci_type        = "perc",
  seed           = 2024,
  verbose        = FALSE
)
print(boot_res)
```

## Step 6: Sensitivity Analysis

```{r sensitivity}
sa <- sensitivity_analysis(
  dr_maic_result   = result,
  trim_percentiles = c(0.90, 0.95, 0.99),
  lovo             = TRUE
)
```

```{r trim-plot, fig.cap="Weight trimming sensitivity"}
if (!is.null(sa$trim_plot)) sa$trim_plot
```

```{r lovo-plot, fig.cap="Leave-one-variable-out sensitivity"}
if (!is.null(sa$lovo_plot)) sa$lovo_plot
```

**E-value interpretation:**
An unmeasured confounder would need at least a
`r round(sa$evalue, 2)`-fold association with both treatment and outcome to
fully explain away the observed treatment effect. Values > 2 generally indicate
a robust finding.

## Step 7: NICE Report

```{r nice-report}
nice_report(
  dr_maic_result   = result,
  bootstrap_result = boot_res,
  sensitivity_result = sa,
  study_a_name     = "KEYNOTE-024 (simulated)",
  study_b_name     = "IMpower150 (simulated)",
  indication       = "Advanced / Metastatic NSCLC",
  treatment_a      = "Pembrolizumab (simulated)",
  treatment_b      = "Atezo + Bev + Chemo (simulated)"
)
```

---

# Advanced Usage

## Adding second-moment matching (mean + SD)

For continuous variables, you can match on both mean and standard deviation:

```{r second-moment, eval=FALSE}
w2 <- compute_weights(
  ipd            = nsclc_ipd,
  target_moments = c(age    = nsclc_agd$mean_age,
                     age_sd = nsclc_agd$sd_age,
                     ecog   = nsclc_agd$prop_ecog1,
                     smoker = nsclc_agd$prop_smoker),
  match_vars      = c("age", "ecog", "smoker"),
  match_var_types = c(age = "mean_sd", ecog = "proportion", smoker = "proportion")
)
```

## Additional prognostic covariates in outcome model

Including additional prognostic variables in the outcome model can improve
efficiency of the DR estimator (even without including them in matching):

```{r additional-covariates, eval=FALSE}
result2 <- dr_maic(
  maic_weights          = w,
  outcome_var           = "response",
  outcome_type          = "binary",
  comparator_estimate   = nsclc_agd$response_rate,
  comparator_se         = nsclc_agd$response_se,
  additional_covariates = c("pdl1_high", "prior_lines"),  # efficiency gain
  effect_measure        = "OR"
)
```

## Time-to-event outcomes

```{r tte, eval=FALSE}
result_os <- dr_maic(
  maic_weights        = w,
  outcome_var         = "os_event",
  outcome_type        = "tte",
  time_var            = "os_time",
  comparator_estimate = log(0.78),  # log-HR from comparator
  comparator_se       = 0.12,
  effect_measure      = "HR"
)
```

---

# Reporting Checklist

Per NICE DSU TSD 18 and ISPOR guidance, a complete DR-MAIC submission should include:

- [ ] Justification for choice of matching variables (clinical rationale)
- [ ] ESS and % of original n
- [ ] Love plot of SMDs before and after weighting
- [ ] Primary DR-MAIC estimate with bootstrap 95% CI (BCa)
- [ ] Comparison of MAIC, STC, and DR-MAIC estimates
- [ ] DR augmentation term (evidence of model concordance)
- [ ] E-value for unmeasured confounding
- [ ] Weight trimming sensitivity analysis
- [ ] Leave-one-variable-out sensitivity analysis
- [ ] Clear statement of assumptions and limitations

---

# References

<div id="refs"></div>