--- title: "YAML Configuration for metaRVM" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{YAML Configuration for metaRVM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The `metaRVM` package uses a YAML file to configure the model parameters. This vignette describes the structure of the YAML configuration file, starting with a simple example and progressively introducing more advanced features. ## Basic Configuration A minimal configuration file specifies the data sources, simulation settings, and disease parameters with fixed scalar values. ```yaml run_id: SimpleRun population_data: initialization: data/population_init.csv vaccination: data/vaccination.csv mixing_matrix: weekday_day: data/m_weekday_day.csv weekday_night: data/m_weekday_night.csv weekend_day: data/m_weekend_day.csv weekend_night: data/m_weekend_night.csv disease_params: ts: 0.5 ve: 0.4 dv: 180 dp: 1 de: 3 da: 5 ds: 6 dh: 8 dr: 180 pea: 0.3 psr: 0.95 phr: 0.97 simulation_config: start_date: 01/01/2025 # m/d/Y length: 90 nsim: 1 nrep: 1 simulation_mode: deterministic random_seed: 42 ``` ### Configuration Sections - **`run_id`**: A unique name for the simulation. - **`population_data`**: Paths to CSV files for population demographics, initial state, and vaccination schedules. - **`mixing_matrix`**: Paths to CSV files defining contact patterns for different times of the week. - **`disease_params`**: Disease characteristics. In this example, all parameters are single, fixed values. - **`simulation_config`**: Settings for the simulation run, such as start date, duration, and number of simulations. ### Input File Structures The `metaRVM` package requires several CSV files to be structured in a specific way. Below are the descriptions for each of the required input files, along with examples of what they should look like. #### Population Data Files - **`initialization`**: This file specifies both the demographic structure and initial state of the population for the simulation. It must contain the following **required columns**: - `population_id`: A unique identifier for each subpopulation (sequential natural numbers: 1, 2, 3, ...) - `N`: The total number of individuals in each subpopulation - `S0`: The initial number of susceptible individuals - `I0`: The initial number of symptomatic infected individuals - `V0`: The initial number of vaccinated individuals - `R0`: The initial number of recovered individuals **User-defined demographic category columns**: Any additional columns beyond the required ones are automatically detected as demographic categories. These can be used for stratification and in `sub_disease_params` for category-specific parameters. Common examples include: - `age`: Age groups (e.g., "0-17", "18-64", "65+") - `race`: Race or ethnicity categories - `zone`: Healthcare zones or geographic regions - Or any custom categories like `income_level`, `occupation`, `risk_group`, etc. **Example of a population initialization file:** ```{r init_example, echo=FALSE} init_file <- system.file("extdata", "population_init_n24.csv", package = "MetaRVM") init_data <- read.csv(init_file) cat("First 10 rows of population_init_n24.csv:\n\n") print(head(init_data, 10)) cat("\n... (", nrow(init_data), " total rows)\n", sep = "") ``` - **`vaccination`**: The vaccination schedule file contains the number of vaccinations administered over time. The first column must be `date` in `MM/DD/YYYY` format, followed by columns for each subpopulation in the same order as `population_id` in the initialization file. **Example of a vaccination schedule file:** ```{r vac_example, echo=FALSE} vac_file <- system.file("extdata", "vaccination_n24.csv", package = "MetaRVM") vac_data <- read.csv(vac_file) cat("First 10 rows of vaccination_n24.csv:\n\n") print(head(vac_data, 10)) cat("\n... (", nrow(vac_data), " total rows)\n", sep = "") cat("\nNote: Columns represent vaccination counts for each population_id (1-24)\n") ``` #### Mixing Matrix Files The mixing matrix files define the contact patterns between different subpopulations. Each file should be a CSV without a header, where the rows and columns correspond to the subpopulations in the same order as `population_id` in the initialization file. The values in the matrix represent the proportion of time that individuals from one subpopulation spend with individuals from another. The sum of each row must equal 1. **Example of a mixing matrix file (weekday day):** ```{r mixing_example, echo=FALSE} mixing_file <- system.file("extdata", "m_weekday_day.csv", package = "MetaRVM") mixing_data <- read.csv(mixing_file, header = FALSE) cat("First 10 rows and 10 columns of m_weekday_day.csv:\n\n") print(head(mixing_data[, 1:10], 10)) cat("\nMatrix dimensions:", nrow(mixing_data), "x", ncol(mixing_data), "\n") cat("Row sums (should all equal 1):\n") row_sums <- rowSums(mixing_data) print(head(row_sums, 10)) ``` ### Disease Parameter Descriptions Below is a list of the disease parameters used in `metaRVM`: - `ts`: Transmission rate for symptomatic individuals in the susceptible population. - `ve`: Vaccine effectiveness (proportion, range: [0, 1]). - `dv`: Mean duration (in days) in the vaccinated state before immunity wanes. - `dp`: Mean duration (in days) in the presymptomatic infectious state. - `de`: Mean duration (in days) in the exposed state. - `da`: Mean duration (in days) in the asymptomatic infectious state. - `ds`: Mean duration (in days) in the symptomatic infectious state. - `dh`: Mean duration (in days) in the hospitalized state. - `dr`: Mean duration (in days) of immunity in the recovered state. - `pea`: Proportion of exposed individuals who become asymptomatic (vs. presymptomatic) (range: 0-1). - `psr`: Proportion of symptomatic individuals who recover directly (vs. requiring hospitalization) (range: 0-1). - `phr`: Proportion of hospitalized individuals who recover (vs. die) (range: 0-1). ## Defining Parameters with Distributions Instead of fixed values, disease parameters can be defined using statistical distributions. This is useful for capturing uncertainty in the parameters. `metaRVM` supports `uniform` and `lognormal` distributions. Here is an example of defining `ve`, `da`, `ds`, and `dh` with distributions: ```yaml disease_params: ts: 0.5 ve: dist: uniform min: 0.29 max: 0.53 dv: 158 dp: 1 de: 3 da: dist: uniform min: 3 max: 7 ds: dist: uniform min: 5 max: 7 dh: dist: lognormal mu: 8 sd: 8.9 dr: 187 pea: 0.333 psr: 0.95 phr: 0.97 ``` - For a `uniform` distribution, `min` and `max` values must be specified. - For a `lognormal` distribution, `mu` and `sd` (mean and standard deviation on the log scale) must be specified. ## Specifying Subgroup Parameters `metaRVM` allows different disease parameters to be specified for demographic subgroups using the `sub_disease_params` section. These subgroup-specific parameters override the global parameters defined in `disease_params`. The demographic categories used in this section must match the user-defined category column names in the initialization CSV file specified under `population_data`. For example, if the initialization file has columns named `age`, `income_level`, and `occupation`, any of these categories can be used in `sub_disease_params`. The specific values (e.g., `"0-4"`, `"low"`, `"healthcare"`) must exactly match the values in those columns. The following example defines different parameters for different age groups: ```yaml sub_disease_params: age: 0-17: dh: 4 pea: 0.08 psr: 0.9303 phr: 0.9920 18-64: dh: 4 pea: 0.08 psr: 0.9726 phr: 0.9920 65+: dh: 7 pea: 0.05 psr: 0.9091 phr: 0.9227 ``` In this configuration, individuals in the "0-4" age group will have a `dh` (duration of hospitalization) of 4, overriding any global `dh` value. Similarly, the transmission rate `ts` for the "18-49" group is set to 0.01. ## Stochastic Simulation with Distributional Parameters When both parameter uncertainty and stochastic disease transitions are represented, set `simulation_mode: stochastic` and define one or more disease parameters as distributions. - `nsim`: number of sampled parameter sets - `nrep`: number of stochastic replicates per parameter set - total runs = `nsim * nrep` Example: ```yaml run_id: StochasticDistRun population_data: initialization: data/population_init.csv vaccination: data/vaccination.csv mixing_matrix: weekday_day: data/m_weekday_day.csv weekday_night: data/m_weekday_night.csv weekend_day: data/m_weekend_day.csv weekend_night: data/m_weekend_night.csv disease_params: ts: 0.5 ve: dist: uniform min: 0.29 max: 0.53 dv: 158 dp: 1 de: 3 da: dist: uniform min: 3 max: 7 ds: dist: uniform min: 5 max: 7 dh: dist: lognormal mu: 2.0 sd: 0.5 dr: 187 pea: 0.333 psr: 0.95 phr: 0.97 simulation_config: start_date: 01/01/2025 length: 90 nsim: 20 nrep: 5 simulation_mode: stochastic random_seed: 42 ``` For reproducibility, provide `random_seed`. This seed is used to reproduce both the parameter draws (for distributional parameters) and the stochastic model replicates. ## Checkpointing and Restoring Simulations For long-running simulations, it is useful to save the state of the model at intermediate points. This is known as checkpointing. `metaRVM` allows checkpoints to be saved and simulations to be restored from a saved state. ### Enabling Checkpointing To enable checkpointing, `checkpoint_dir` and optionally `checkpoint_dates` need to be added to the `simulation_config` section of the YAML file. - `checkpoint_dir`: The directory where checkpoint files will be saved. - `checkpoint_dates`: A list of dates (in `MM/DD/YYYY` format) on which to save a checkpoint. If this is not provided, a single checkpoint will be saved at the end of the simulation. Here is an example of how to configure checkpointing: ```yaml simulation_config: start_date: 01/01/2025 length: 90 nsim: 10 nrep: 1 simulation_mode: deterministic random_seed: 42 checkpoint_dir: "path/to/checkpoints" checkpoint_dates: ["01/15/2025", "01/30/2025"] ``` ### Restoring from a Checkpoint To restore a simulation from a checkpoint file, the `restore_from` parameter is used in the `simulation_config` section. The model is initialized with the state saved in the specified checkpoint file. ```yaml simulation_config: start_date: 01/30/2025 # Should be the next date of the checkpoint date length: 60 # Remaining simulation length nsim: 10 nrep: 1 simulation_mode: deterministic random_seed: 42 restore_from: "path/to/checkpoints/checkpoint_2025-01-30_instance_1.Rda" ``` When restoring, the `start_date` should correspond to the next date of the checkpoint, and the `length` should be the remaining duration of the simulation. Note that each instance of a simulation must be restored individually.