lineagefreq provides multiple modeling engines through the unified
fit_model() interface. This vignette shows how to compare
them using the built-in backtesting framework.
Three engines are available in v0.1.0:
The default engine fits a multinomial logistic regression with one growth rate parameter per non-reference lineage.
data(sarscov2_us_2022)
x <- lfq_data(sarscov2_us_2022,
lineage = variant, date = date,
count = count, total = total)
fit_mlr <- fit_model(x, engine = "mlr")
growth_advantage(fit_mlr, type = "growth_rate")
#> # A tibble: 5 × 6
#> lineage estimate lower upper type pivot
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 BA.1 0 0 0 growth_rate BA.1
#> 2 BA.2 0.231 0.229 0.232 growth_rate BA.1
#> 3 BA.4/5 0.400 0.398 0.402 growth_rate BA.1
#> 4 BQ.1 0.352 0.346 0.357 growth_rate BA.1
#> 5 Other 0.151 0.149 0.152 growth_rate BA.1The Piantham engine wraps MLR and translates growth rates to relative effective reproduction numbers using a specified mean generation time.
fit_pian <- fit_model(x, engine = "piantham",
generation_time = 5)
growth_advantage(fit_pian, type = "relative_Rt",
generation_time = 5)
#> # A tibble: 5 × 6
#> lineage estimate lower upper type pivot
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 BA.1 1 1 1 relative_Rt BA.1
#> 2 BA.2 1.18 1.18 1.18 relative_Rt BA.1
#> 3 BA.4/5 1.33 1.33 1.33 relative_Rt BA.1
#> 4 BQ.1 1.29 1.28 1.29 relative_Rt BA.1
#> 5 Other 1.11 1.11 1.11 relative_Rt BA.1glance() returns a one-row summary for each model. Since
Piantham is a wrapper around MLR, the log-likelihood and AIC are
identical.
dplyr::bind_rows(
glance.lfq_fit(fit_mlr),
glance.lfq_fit(fit_pian)
)
#> # A tibble: 2 × 10
#> engine n_lineages n_timepoints nobs df logLik AIC BIC pivot
#> <chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
#> 1 mlr 5 40 461424 8 -465911. 931838. 931852. BA.1
#> 2 piantham 5 40 461424 8 -465911. 931838. 931852. BA.1
#> # ℹ 1 more variable: convergence <int>The backtest() function implements rolling-origin
evaluation. At each origin date, the model is fit on past data and
forecasts are compared to held-out future observations.
bt <- backtest(x,
engines = c("mlr", "piantham"),
horizons = c(7, 14, 21),
min_train = 56,
generation_time = 5
)
#> Backtesting ■■■ 7% | ETA: 16s
#> Backtesting ■■■■■■ 17% | ETA: 14s
#> Backtesting ■■■■■■■■■■ 31% | ETA: 13s
#> Backtesting ■■■■■■■■■■■■■■■ 46% | ETA: 11s
#> Backtesting ■■■■■■■■■■■■■■■■■■ 57% | ETA: 9s
#> Backtesting ■■■■■■■■■■■■■■■■■■■■■ 68% | ETA: 7s
#> Backtesting ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 5s
#> Backtesting ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 3s
#> Backtesting ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 99% | ETA: 0s
#> Backtesting ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
bt
#>
#> ── Backtest results
#> 900 predictions across 31 origins
#> Engines: "mlr, piantham"
#> Horizons: 7, 14, 21 days
#>
#> # A tibble: 900 × 9
#> origin_date target_date horizon engine lineage predicted lower upper observed
#> * <date> <date> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2022-03-05 2022-03-12 7 mlr BA.1 0.44 0.35 0.53 0.439
#> 2 2022-03-05 2022-03-12 7 mlr BA.2 0.18 0.115 0.26 0.172
#> 3 2022-03-05 2022-03-12 7 mlr BA.4/5 0 0 0.02 0.00577
#> 4 2022-03-05 2022-03-12 7 mlr BQ.1 0 0 0.01 0.000406
#> 5 2022-03-05 2022-03-12 7 mlr Other 0.37 0.28 0.46 0.382
#> 6 2022-03-05 2022-03-19 14 mlr BA.1 0.39 0.31 0.49 0.399
#> 7 2022-03-05 2022-03-19 14 mlr BA.2 0.21 0.13 0.28 0.199
#> 8 2022-03-05 2022-03-19 14 mlr BA.4/5 0.01 0 0.03 0.00799
#> 9 2022-03-05 2022-03-19 14 mlr BQ.1 0 0 0.01 0.000543
#> 10 2022-03-05 2022-03-19 14 mlr Other 0.39 0.29 0.48 0.394
#> # ℹ 890 more rowsscore_forecasts() computes standardized accuracy
metrics.
sc <- score_forecasts(bt,
metrics = c("mae", "coverage"))
sc
#> # A tibble: 12 × 4
#> engine horizon metric value
#> <chr> <int> <chr> <dbl>
#> 1 mlr 7 mae 0.00432
#> 2 mlr 7 coverage 1
#> 3 mlr 14 mae 0.00398
#> 4 mlr 14 coverage 1
#> 5 mlr 21 mae 0.00452
#> 6 mlr 21 coverage 1
#> 7 piantham 7 mae 0.00418
#> 8 piantham 7 coverage 1
#> 9 piantham 14 mae 0.00421
#> 10 piantham 14 coverage 1
#> 11 piantham 21 mae 0.00432
#> 12 piantham 21 coverage 1compare_models() summarizes scores per engine, sorted by
MAE.
| Scenario | Recommended engine |
|---|---|
| Single location, quick estimate | mlr |
| Need relative Rt interpretation | piantham |
| Multiple locations, sparse data | hier_mlr |
| Time-varying fitness (v0.2) | garw |
When data spans multiple locations with unequal sequencing depth,
hier_mlr shrinks location-specific estimates toward the
global mean. This stabilizes estimates for low-data locations.
A demonstration requires multi-location data, which the built-in
single-location dataset does not provide. See ?fit_model
for an example with simulated multi-location data.