--- title: "Analyzing real CDC surveillance data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Analyzing real CDC surveillance data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ## Overview This vignette demonstrates the complete lineagefreq workflow on **real surveillance data** from the U.S. CDC. The built-in dataset `cdc_sarscov2_jn1` contains actual weighted variant proportion estimates from CDC's national genomic surveillance program, covering the JN.1 emergence wave (October 2023 to June 2024). ## Load data ```{r load} library(lineagefreq) data(cdc_sarscov2_jn1) str(cdc_sarscov2_jn1) ``` ```{r lfq-data} vd <- lfq_data(cdc_sarscov2_jn1, lineage = lineage, date = date, count = count) vd ``` ## Collapse rare lineages During JN.1's rise, several lineages circulated at low frequency. We collapse those below 5% peak frequency into "Other". ```{r collapse} vd_clean <- collapse_lineages(vd, min_freq = 0.05) attr(vd_clean, "lineages") ``` ## Fit MLR model ```{r fit} fit <- fit_model(vd_clean, engine = "mlr") fit ``` ## Growth advantages ```{r ga} ga <- growth_advantage(fit, type = "relative_Rt", generation_time = 5) ga ``` ```{r plot-advantage} autoplot(fit, type = "advantage", generation_time = 5) ``` JN.1 shows a strong growth advantage over previously circulating XBB-derived lineages, consistent with published CDC estimates. ## Frequency trajectories ```{r plot-trajectory} autoplot(fit, type = "trajectory") ``` ## Forecast ```{r forecast} fc <- forecast(fit, horizon = 28) autoplot(fc) ``` ## Emergence detection ```{r emergence} summarize_emerging(vd_clean) ``` ## Sequencing power How many sequences per week are needed to detect a variant at 1%? ```{r power} sequencing_power( target_precision = 0.05, current_freq = c(0.01, 0.02, 0.05) ) ``` ## Session info ```{r session-info} sessionInfo() ```