--- title: "Frequently Asked Questions" output: rmarkdown::html_vignette: df_print: kable vignette: > %\VignetteIndexEntry{Frequently Asked Questions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Basic example Most subsequent examples build on this simple rifttable: ```{r basic, message = FALSE} library(rifttable) library(dplyr) # for data management, e.g., mutate() library(tibble) # for constructing a tibble, e.g. via tribble() data(breastcancer, package = "risks") design <- tribble( ~label, ~type, ~stratum, "**Overall**", "", "", " Deaths/N", "outcomes/total", c("Low", "High"), " Risk", "risk", c("Low", "High"), " Risk ratio (95% CI)", "rr", c("Low", "High"), " Risk difference (95% CI)", "rd", c("Low", "High"), "", "", "", "**Low hormone receptor**", "", "", " Deaths/N (Risk)", "outcomes/total (risk)", "Low", " Risk difference (95% CI)", "rd", "Low", "**High hormone receptor**", "", "", " Deaths/N (Risk)", "outcomes/total (risk)", "High", " Risk difference (95% CI)", "rd", "High" ) |> mutate( exposure = "stage", outcome = "death", effect_modifier = "receptor" ) rifttable( design = design, data = breastcancer ) |> rt_gt() # obtain formatted output ``` # Why do I get an error? R's error messages can be frustrating. When using rifttable, these are the typical sources of errors: * Clerical errors in variable names and arguments. There is no magic here except double-checking the code. * Missing data. See below: [How do I handle missing data?](#how-do-i-handle-missing-data). * Discrepancy between estimator (`type`) and the data. For example, `type = "mean"` will not work on a categorical (factor) variable. * Models that fail to converge. For example, one may be trying to estimate a risk ratio with 0 outcomes in the reference category, or adjusting for 20 covariates in a Cox model with 5 events overall. (Sometimes such attempts will "just" return warning messages -- it is still worth rethinking the modeling strategy.) To identify where an error is coming from, start simple. Comment out all but one line of the `design`, putting `#` at the beginning of the line. Start with a line that gives basic descriptive data, such as `type = "total"`, `type = "outcomes"` or `type = "events/time"`, and re-run `rifttable()`. Then add more lines with descriptive estimators, one by one. At the end, add lines that fit models, such as `type = "hr"`. # What is the `design`? The `design` that the `rifttable()` function takes as input is simply a dataset that defines how the table should look like when `rifttable()` has processed the `data`. The `design` can be constructed in many different ways. All lead to the same table: 1. **A dataset (tibble) defined using `tribble()`** ```{r} design1 <- tribble( ~label, ~exposure, ~outcome, ~type, "N", "stage", "death", "total", "Deaths", "stage", "death", "outcomes" ) design1 rifttable( design = design1, data = breastcancer ) |> rt_gt() ``` 2. **A dataset (tibble) defined using `tibble()`** ```{r} design2 <- tibble( label = c("N", "Deaths"), exposure = "stage", outcome = "death", type = c("total", "outcomes") ) design2 rifttable( design = design2, data = breastcancer ) |> rt_gt() ``` 3. **Concatenating tibbles, then editing with `mutate()`** ```{r} design3 <- bind_rows( tibble( # row 1 label = "N", type = "total" ), tibble( # row 2 label = "Deaths", type = "outcomes" ) ) |> mutate( # elements that are the same for all rows exposure = "stage", outcome = "death" ) design3 rifttable( design = design3, data = breastcancer ) |> rt_gt() ``` 4. **For descriptive tables: Use `table1_design()`** ```{r} design4 <- breastcancer |> table1_design( death, # the total count will automatically be included by = stage ) design4 rifttable( design = design4, data = breastcancer ) |> rt_gt() ``` See a separate overview about [a descriptive Table 1](table1.html). 5. **External datasets** The `design` could even be written out in an external dataset that can be loaded with `readr::read_csv()` (for CSV files) or `readxl::read_excel()` (for Excel sheets). # How do I handle missing data? rifttable tries to make as few assumptions as possible about how the user wants to treat missing data. * Missing values in **exposure**: By default, missing vales (`NA`) in the exposure are displayed as a separate exposure category for descriptive statistics (e.g., `type = "total"` or `type = "mean"`). They are omitted by comparative estimators (e.g., regression models). To change this behavior, call `rifttable()` with the argument `exposure_levels = "nona"`. * Missing values in **outcome**: By default, descriptive statistics will be missing (e.g., results will be `--` or `NA`), and regression models will use the non-missing observations. To exclude observations with missing outcome values altogether, add `na_rm = TRUE` to (specific rows of) the `design`. * Missing values in **confounders**: Applies only to regression models, which typically exclude observations with missing values. * Missing values in the **effect modifier**: Stratified and joint models are only shown for the specified `stratum` of the `effect_modifier`. To include observations with the missing effect modifier, add `NA` to the requested stratum in the `design`, e.g., `effect_modifier = "bmi", stratum = c("<25", NA)`. # How do I add overall statistics? Use the `overall` argument to show descriptive data for the entire `data` set. Inferential estimators showing comparisons between exposure categories will be blank there. ```{r overall} rifttable( design = design, data = breastcancer, overall = TRUE ) |> rt_gt() # obtain formatted output ``` # How do I test for trend? Instead of *testing* a null hypothesis about a trend, rifttable proposes *estimating* the difference in the outcome for a one-unit higher exposure. This is also called a linear slope. Here, we estimate the risk associated for stage that is one category higher. ```{r trend} rifttable( design = design |> mutate(trend = "stage_numeric"), data = breastcancer |> mutate(stage_numeric = as.numeric(stage)) ) |> rt_gt() # obtain formatted output ``` # How do I show multiple exposures in the same table? Our simple toy dataset just has one exposure variable. For demonstration, we just create a second variable, with two categories, "Level 1" and "Level 2," which is a simplified combination of the `stage` and `receptor` variables. We will flip the table `layout` from `"rows"` (the default) to `"cols"` and concatenate two rifttables. We also need to give our new `exposure2` variable the same label as `stage` to make sure results appear in the same column. ```{r multiple} breastcancer_2exposures <- breastcancer |> mutate( exposure2 = case_when( stage == "Stage I" | (stage == "Stage II" & receptor == "High") ~ "Level 1", stage == "Stage III" | (stage == "Stage II" & receptor == "Low") ~ "Level 2" ) ) attr(breastcancer_2exposures$exposure2, which = "label") <- "Exposure" attr(breastcancer_2exposures$stage, which = "label") <- "Exposure" bind_rows( design |> mutate(exposure = "exposure2") |> slice(2:5) |> rifttable( data = breastcancer_2exposures, layout = "cols" ), design |> slice(2:5) |> rifttable( data = breastcancer_2exposures, layout = "cols" ) ) |> rt_gt() # obtain formatted output ``` # How do I change how results are rounded? By default, difference measures are being rounded to 2 decimal digits (`0.01`), such as `type = "diff"`, the mean difference, or `type = "quantreg"`, the median difference. The same goes for risk measures, such as `type = "risk"`, unless shown as percentage points. Ratio measures are also shown with 2 decimal digits, such as `type = "hr"`, the hazard ratio, or `type = "fold"`, a ratio of arithmetic means. Rounding can be changed by setting the `rifttable()` arguments `diff_digits`, `risk_digits`, and `ratio_digits` globally for the entire table. ```{r rounding} design <- tribble( ~label, ~type, "Deaths/N", "outcomes/total", "Risk", "risk", "Risk ratio (95% CI)", "rr", "Odds ratio (95% CI)", "or", "Risk difference (95% CI)", "rd" ) |> mutate( exposure = "stage", outcome = "death" ) rifttable( design = design, data = breastcancer, ratio_digits = 3, # Many digits for ratios risk_digits = 1 ) |> # Fewer digits for risks rt_gt() # obtain formatted output ``` As can be seen, ratios > 3 are still shown with 1 fewer decimal, and ratios > 10 are shown with 2 fewer decimals ([Wilcox, *Epidemiology* 2004](https://doi.org/10.1097/01.ede.0000101026.08873.14) motivates why). To disable such additional rounding of extremely high ratios: ```{r rounding_decrease} rifttable( design = design, data = breastcancer, ratio_digits = 3, ratio_digits_decrease = NULL, # Do not round high ratios more risk_digits = 1 ) |> rt_gt() # obtain formatted output ``` Additionally, rounding can be changed for each row, adding a column `digits` to the rifttable `design`: ```{r rounding_digits} tribble( ~label, ~type, ~digits, "Deaths/N", "outcomes/total", NA, # Uses rifttable default "Risk", "risk", NA, # Uses risk_digits below "Risk ratio (95% CI)", "", NA, " Rounded to 1 digit", "rr", 1, " Rounded to 2 digits", "rr", 2, "Risk difference (95% CI)", "rd", 3 ) |> # Overrides risk_digits mutate( exposure = "stage", outcome = "death" ) |> rifttable( data = breastcancer, risk_digits = 1 ) |> # Fewer digits for risks, unless specified by "digits" rt_gt() # obtain formatted output ``` # How can I create joint models? By default, regression models will be fit separately for each stratum of the `effect_modifier`. Append `"_joint"` to `"hr"`, `"rr"`, `"rd"`, `"irr"`, `"irrrob"`, `"diff"`, `"fold"`, `"foldlog"`, `"quantreg"`, or `"or"` to obtain "joint" models for exposure and effect modifier that have a single reference category. Note that the joint model will be fit across all non-missing (`NA`) strata of the effect modifier, even if the `design` table does not request all strata be shown. Compare stratified models to joint models for risk differences (for simplicity of presentation, count data are omitted): ```{r joint} tribble( ~label, ~type, ~stratum, "**Overall**", "rd", c("Low", "High"), "", "", "", "**Stratified models**", "", "", " Low hormone receptor", "rd", "Low", " High hormone receptor", "rd", "High", "", "", "", "**Joint models**", "", "", " Low hormone receptor", "rd_joint", "Low", " High hormone receptor", "rd_joint", "High" ) |> mutate( exposure = "stage", outcome = "death", effect_modifier = "receptor" ) |> rifttable(data = breastcancer) |> rt_gt() ``` # How can I change the reference category? The reference categories for exposure and effect modifier are always their first factor levels. Compare the preceding example: `"High"` is, alphabetically, before `"Low"`. To change the reference category, use `forcats::fct_relevel()` or the base R alternative `relevel()` on variables in the `data` provided to `rifttable()`: ```{r joint_reorder} tribble( ~label, ~type, ~stratum, "**Joint models**", "", "", " Low hormone receptor", "rd_joint", "Low", " High hormone receptor", "rd_joint", "High" ) |> mutate( exposure = "stage", outcome = "death", effect_modifier = "receptor" ) |> rifttable( data = breastcancer |> mutate( receptor = relevel( factor(receptor), # Make "receptor" a factor in the first place ref = "Low" ) ) ) |> # Set new reference category rt_gt() ``` If a middle category of the exposure `stage` is desired as the reference: ```{r reorder_middle} result_reordered <- tibble( label = "**RD (95% CI)**", type = "rd", exposure = "stage", outcome = "death" ) |> rifttable( data = breastcancer |> mutate( stage = relevel( stage, ref = "Stage II" ) ) ) result_reordered |> rt_gt() ``` Using `forcats::fct_relevel()` may be preferable over `relevel()`, as it preserves the variable label: here, the variable `stage` lost its label, `"Stage"` starting with upper case S. That results for "Stage II" are now listed first is probably undesirable. Reorder the columns of the table that `rifttable()` produced to print results for "Stage I" first: ```{r reorder_middle2} result_reordered |> select(stage, "Stage I", everything()) |> rt_gt() ``` # How I do I change the level for confidence intervals? Add a `ci` column to the `design`: ```{r change_ci} tribble( ~label, ~type, ~ci, "Deaths/N (Risk)", "outcomes/total (risk)", NA, "Risk ratio", "", NA, " 80% CI", "rr", 0.8, " 95% CI", "rr", NA, # Defaults to 0.95 " 99% CI", "rr", 0.99 ) |> mutate( exposure = "stage", outcome = "death" ) |> rifttable( data = breastcancer, risk_percent = TRUE ) |> rt_gt() # obtain formatted output ``` # How do I make rifttable calculate an estimand that is not built-in? While the package provides a number of estimators commonly used in epidemiology, it will never be able to include all possible estimators. However, any custom estimate can be integrated into a rifttable by a defining custom estimation function. The subsequent example will reproduce the following basic rifttable, which shows the mean age by sex, stratified by ECOG performance status, in the `cancer` data set: ```{r custom, message = FALSE} data(cancer, package = "survival") cancer <- cancer |> tibble::as_tibble() |> mutate( sex = factor( sex, levels = 1:2, labels = c("Male", "Female") ) ) design <- tibble::tibble( type = "mean", exposure = "sex", outcome = "age", effect_modifier = "ph.ecog", stratum = 1:2, label = paste0("ECOG PS ", stratum, ": mean age") ) design |> rifttable( data = cancer, overall = TRUE ) |> rt_gt() ``` Instead of relying on rifttable's built-in estimator `type = "mean"`, we will define a custom function that calculates the mean: ```{r custom_fn} estimate_my_mean <- function(data, ...) { data |> group_by(.exposure) |> summarize( res = paste( round( mean(.outcome), digits = 3 ), "yrs" ) ) } ``` Use the custom function `my_mean` instead of the built-in `mean`: ```{r custom_use} design |> # Edit the previous design mutate( type = "my_mean", # Replace built-in "mean" by custom "my_mean" label = paste0(label, " (custom)") ) |> rifttable( data = cancer, overall = TRUE ) |> rt_gt() ``` Specifications for custom functions: * The function name must start with `estimate_`; this prefix is to be omitted when later calling the custom function within `rifttable()`. * The `data` provided to `rifttable()` will be available to the custom function as an argument named `data`. * The `data` are already subsetted to the `stratum` of the `effect_modifier`, if applicable. * Copies of key variables are accessible under the names, `.exposure`, `.outcome`, `.event`, `.time`, and `.time2`, as applicable. * The function must accept all elements of a rifttable `design` (e.g., `confounders`, `digits`, `na_rm`, etc.) and of the `rifttable()` function (e.g., `reference`, `risk_percent`) as arguments. Many may not be relevant and can be captured in the argument list, `...` (see example). * The function must return a tibble/data frame with one column for `.exposure`, with one row per exposure category, and one string column `res` for the estimate.