---
title: "Comparing modeling engines"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Comparing modeling engines}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
```

## Overview

lineagefreq provides multiple modeling engines through the unified
`fit_model()` interface. This vignette shows how to compare them
using the built-in backtesting framework.

Three engines are available in v0.1.0:

- **mlr**: Multinomial logistic regression (frequentist, fast).
- **piantham**: Converts MLR growth rates to relative reproduction
  numbers using a generation-time approximation.
- **hier_mlr**: Hierarchical MLR with partial pooling across
  locations via empirical Bayes shrinkage.

## Setup

```{r setup}
library(lineagefreq)
```

## Engine 1: MLR

The default engine fits a multinomial logistic regression with
one growth rate parameter per non-reference lineage.

```{r mlr}
data(sarscov2_us_2022)
x <- lfq_data(sarscov2_us_2022,
              lineage = variant, date = date,
              count = count, total = total)

fit_mlr <- fit_model(x, engine = "mlr")
growth_advantage(fit_mlr, type = "growth_rate")
```

## Engine 2: Piantham

The Piantham engine wraps MLR and translates growth rates to
relative effective reproduction numbers using a specified mean
generation time.

```{r piantham}
fit_pian <- fit_model(x, engine = "piantham",
                      generation_time = 5)
growth_advantage(fit_pian, type = "relative_Rt",
                 generation_time = 5)
```

## Comparing fit statistics

`glance()` returns a one-row summary for each model. Since
Piantham is a wrapper around MLR, the log-likelihood and AIC
are identical.

```{r glance}
dplyr::bind_rows(
  glance.lfq_fit(fit_mlr),
  glance.lfq_fit(fit_pian)
)
```

## Backtesting

The `backtest()` function implements rolling-origin evaluation.
At each origin date, the model is fit on past data and forecasts
are compared to held-out future observations.

```{r backtest}
bt <- backtest(x,
  engines = c("mlr", "piantham"),
  horizons = c(7, 14, 21),
  min_train = 56,
  generation_time = 5
)
bt
```

## Scoring

`score_forecasts()` computes standardized accuracy metrics.

```{r score}
sc <- score_forecasts(bt,
  metrics = c("mae", "coverage"))
sc
```

## Model ranking

`compare_models()` summarizes scores per engine, sorted by MAE.

```{r compare}
compare_models(sc, by = c("engine", "horizon"))
```

## Visualization

```{r plot-backtest}
plot_backtest(sc)
```

## When to use which engine

| Scenario | Recommended engine |
|----------|--------------------|
| Single location, quick estimate | `mlr` |
| Need relative Rt interpretation | `piantham` |
| Multiple locations, sparse data | `hier_mlr` |
| Time-varying fitness (v0.2) | `garw` |

## Hierarchical MLR

When data spans multiple locations with unequal sequencing depth,
`hier_mlr` shrinks location-specific estimates toward the global
mean. This stabilizes estimates for low-data locations.

A demonstration requires multi-location data, which the built-in
single-location dataset does not provide. See
`?fit_model` for an example with simulated multi-location data.