--- title: "splinemixmeta" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{splinemixmeta} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction Package `splinemixmeta` provides some basic capabilities in non-parametric meta-regression. If one has responses `y`, each with a standard error, from previous analyses, and wants to regress those against study-level explanatory variables `x` without assuming linearity, `splinemixmeta` allows an arbitrary smooth relationship to be estimated as a smoothing spline. It is also possible to include other explanatory variables (assumed to have a linear relationship to `y`) and random effects groupings. Such an approach may be referred to as "non-parametric meta-analysis", "spline meta-regression", "meta-splines", or other similar terminology mash-ups. `splinemixmeta` works by bringing together features of `mgcv` for building spline components and `mixmeta` for estimating general mixed-effects meta-analysis models. Specifically, the splines of `mgcv` can be used as random effects components, with the penalty matrix providing a correlation structure for spline coefficients, the unknown smoothness parameter replaced by an unknown variance parameter, and any unpenalized directions in the spline parameter space (e.g. spline parameters representing a linear relationship) moved into fixed effects. The resulting pieces are then set up as needed for a call to `mixmeta::mixmeta` for estimating the model. Variance parameters are always estimated by REML (restricted maximum likelihood). There are some important limits on what will work. At the moment, the only recommended spline basis functions are `bs="cr"`, `bs="cs"`, and `bs="cc"`. These are supported because the associated covariance matrices can be diagonalized. The `mgcv` default of `bs="tp"` does not currently produce good results. One can have multiple spline components, although this is rather limited because most of the choices in `mgcv` for bivariate splines are not among those supported. It is possible that `bs = "mrf"` also works, but it has not been tested. See `help(splinemixmeta)` for more about what is supported. The estimation machinery is not particularly efficient and so may be notably slow for large models. # Example Here we give a simulated example. Say we have `n=50` values, with 5 values from each of 10 groups, with a sinusoidal relationship between `y` and `x` to provide a simple nonlinear example. The groups will be treated as random effects, as will the spline. The simplest case would not have groups, but here we illustrate a case that does. ## Data simulation ```{r fig.height = 4, fig.width = 6} set.seed(1) n <- 50 num_groups <- 10 group <- rep(1:num_groups, each = 5) |> factor() group_effects <- rnorm(length(group), 0, 0.3) x <- runif(n, 0, 20) coef_x <- 0.6 se <- runif(n, 0.1, 0.3) fx <- 1.5 * sin(2 * pi * x / 12) y <- 10 + coef_x * x + fx + group_effects[group] + fx + rnorm(n, 0, 1) + rnorm(n, 0, se) data <- data.frame(y, x, se, group) plot(y ~ x, data = data, col = group, pch = 19, main = "Simulated data (color = group)") ``` This data simulation has the following pieces: - We have 10 groups of 5 data points each. (Since the `x` values are all drawn independently, the groups are independent from `x`.) - `x` is uniformly distributed from 0 to 20. - We assume each `y` was obtained separately from a previous study, and these studies are grouped in some relevant way (e.g. from the same study region or same lab). - Fixed effects for `y` include an intercept and linear term. - Random effects for `y` include the group effects. - The sinusoidal term for `y` will be estimated as a spline, treated as a random effect. - Residuals are normally distributed with `sd = 1`. - Measurement errors (i.e. estimation error from the previous studies from which `y` were obtained) are normally distributed with standard deviations that are the standard errors (`se`) from the previous studies. These standard errors are simulated as uniformly distributed between 0.1 and 0.3 to reflect heterogeneity in the precision of the previous studies. ## Estimation of a spline meta-regression model The spline meta-regression model can be estimated like this: ```{r} library(splinemixmeta) smm <- splinemixmeta( mgcv::s(x, bs = "cr"), y ~ x, se = se, manual_fixed = TRUE, data = data, random = ~ 1 | group) ``` The object `smm` will have class "splinemixmeta" and "mixmeta". First we can look at a summary of the model: ```{r} summary(smm) ``` The spline terms are labeled as `basisFxns_A`. Further splines would be `basixFxns_B`, etc. Two factors have been introduced. One is called `all`, which has a single level for every row of the data, which facilitates use of `mixmeta::mixmeta`. The other is called `ID`, which has a unique value for each row. In meta-regression, it can be tempting to think that the standard errors associated with `y` values serve as the only source of residual variation, but that is not typically the case. If each `y` came from a large sample size in the previous studies, the standard errors would be small, but we would still expect study-level variation around any regression line because the world is noisy. That study-level variation, i.e. additional residual variation, is set up as a random effect with one level for each `y`. By using `y ~ x`, we included a linear term for `x` directly (manually), and thus we needed to set `manual_fixed = TRUE`. Otherwise `splinemixmeta` would have obtained a linear term from the spline setup and included it in the model, and we don't want it duplicated. (Such a linear term, or more generally unpenalized dimensions of the spline coefficients, will always be removed from the smooth term when it is converted into the format of a random effect.) In the `summary` output, we see the full covariance matrix from the spline random effect, which is the vector of spline coefficients. This is shown as the standard deviation for each component (all the same) and their correlations (all 0), reflecting a covariance matrix that is a constant times an identity matrix. It has this structure because the random effects structure from the spline basis functions and penalty matrix are rotated in order to work with a diagonal covariance matrix. Finally we see the estimated standard deviation between groups and the estimated residual standard deviation, shown as the `~1 | ID` term. (If the default names `all` or `ID` are already used in the data set, `splinemixmeta` will choose different names.) ## Predictions Predictions can be obtained at multiple levels of the random effects. The `predict` function for `splinemixmeta` objects simplifies this by allowing the choices of including any spline terms (default `TRUE`), other random effects (default `FALSE` but possibly of interest) and residuals (default `FALSE` and typically not of interest). Predictions will come with columns for standard errors and variances (squared standard errors). The machinery for predictions is modified from `mixmeta::blup()`, where "blup" means "best linear unbiased predictor", a standard term in linear mixed effects modeling. Hence the column of predictions returned by `predict` is labeled "`blup`". The blups come from the conditional distribution of random effects, given the data and estimated variance parameters. Following `mixmeta::blup`, predictions can be of type "outcome", in which case variance from fixed effects terms is included in the prediction variance (and standard errors), or "residual", in which case only variance from random effects is included. The default is "outcome". See `help(predict.splinemixmeta)` for details. Any further fine-grained control (such as including one spline but not another), can be done by calling `splinemixmeta::blup()` directly. Here are two versions of predictions: ```{r} pred_spline_only <- predict(smm, include_smooths = TRUE, include_REs = FALSE, include_residuals = FALSE, type = "outcome") pred_spline_and_groups <- predict(smm, include_smooths = TRUE, include_REs = TRUE, include_residuals = FALSE, type = "outcome") # look at both together head(cbind(pred_spline_only, pred_spline_and_groups)) # look at only fixed effect, which could be done with `predict` # but here we illustrate a direct call to `blup` blup(smm, level=0, vcov = TRUE, se = TRUE) |> head() ``` ## Figures `splinemixmeta` provides a basic capability for figures, for which the suggested package `ggplot2` is needed. ```{r fig.height = 4, fig.width = 6} plot(smm, ylab = "y", title = "Spline meta-regression fit") ``` In this figure, the `y` values are shown with approximate 95% confidence intervals calculated as +/- 2 standard errors, from the standard errors that were input as the `se` argument to `splinemixmeta`. The fitted line by default includes fixed effects and spline terms (blups), with a 95% confidence envelope.