%\VignetteIndexEntry{Analysing interactions of fitted models} \documentclass[english]{article} \usepackage{float,graphicx,geometry} \geometry{top=2.5cm,bottom=2.5cm,left=3cm,right=3cm} \usepackage{amsmath} \usepackage{Sweave} \usepackage{babel} \newcommand{\pkg}[1]{{\textbf{#1}}} \newcommand{\R}{\textsf{R}} \newcommand{\code}[1]{{\texttt{#1}}} <>= options(useFancyQuotes = FALSE, width=85) car.citation <- citation("car") car.citation$key <- "Fox-CAR2011" nlme.citation <- citation("nlme")[1] nlme.citation$key <- "Pinheiro-nlme" lme4.citation <- citation("lme4")[1] lme4.citation$key <- "Bates-lme4" @ \begin{filecontents}{bibextra.bib} <>= toBibtex(car.citation) toBibtex(nlme.citation) toBibtex(lme4.citation) @ \end{filecontents} \begin{document} \setkeys{Gin}{height=0.4\textheight, keepaspectratio=true} \DefineVerbatimEnvironment{Sinput}{Verbatim}{fontsize=\small,fontshape=sl} \DefineVerbatimEnvironment{Soutput}{Verbatim}{fontsize=\small,fontshape=n} \DefineVerbatimEnvironment{Scode}{Verbatim}{fontsize=\small,fontshape=n} \title{Analysing interactions of fitted models} \author{Helios De Rosario Mart\'inez} \maketitle \begin{abstract} This vignette presents a brief review about the existing approaches for the \emph{post-hoc} analysis of interactions in factorial experiments, and describes how to perform some of the cited calculations and tests with the functions of the package \pkg{phia} in \R{}. Those functions include the calculation and plotting of cell means, and testing simple effects, residual effects, and interaction contrasts, among other possibilities. They can be applied to linear and generalized linear models, with or without covariates, and to mixed or multivariate linear models for repeated measures experiments. \end{abstract} \section{Introduction} The \emph{post-hoc} analysis of interactions in factorial ANOVA is a controversial issue, that has generated many discussions and a variety of methods. Traditionally, the most frequent practice has been the analysis of \emph{simple main effects}, i.e. the main effect of one factor at fixed values of the other factors. This type of analysis, however, has severly been criticized for its mixing both main and interaction effects. Marascuilo and Levin stated in 1970 that analysing the simple effects of a significant interaction was a typical case of the so-called ``Type IV error'': a wrong interpretation of a correctly rejected null hypothesis, since that analysis does not investigate the hypothesis that is presumably being tested \cite{marascuilo_appropriate_1970}. On the other hand, they proposed the analysis of \emph{interaction contrasts} (crossed contrasts of different factors) or the \emph{interaction effects} (the value of the interaction after removing low-order effects). The latter option was avidly supported by Rosnow and Rosenthal as well, who called that concept \emph{residual} or \emph{leftover contrasts} \cite{rosenthal_definition_1989,rosnow_things_1995}. However, such proposals have not produced an established ``correct'' practice at all. In fact, the analysis of interactions has been a heated field of debate for years; see Games' defence of simple effect tests \cite{games_type_1973} and Levin's and Marascuilo's response \cite{levin_type_1973}, or the criticisms of Meyer and Petty et al. to Rosnow's and Rosenthal's proposals \cite{meyer_misinterpretation_1991,petty_understanding_1996}, as well as the answer to the latter \cite{rosnow_contrasts_1996}. Although the theoretical issues of simple effects tests are generally acknowledged, an eventual consensus about the ``best'' alternative method is probably difficult to achieve. A general valid procedure is not possible in the first place, since the correct test depends on the specific problem addressed by the experiment. Unfortunately, many researchers do not choose a method depending on the question they want to answer. In spite of the criticism received by the analysis of simple effects, various reviews of published research in the three last decades have shown that it still is by far the most frequent practice \cite{rosenthal_definition_1989,ottenbacher_interpretation_1991,pardo_interaccion_2007}. According to Pardo's et al. interpretation, this is partly due to the limitations of commonplace software packages, which do not provide direct facilities for analysing the contrasts that isolate interaction effects \cite{pardo_interaccion_2007}. On the other hand, the flexibility of \R{} does allow to analyse any kind of contrast across the factors of fitted models, even beyond the two- or three-factorial designs that are normally discussed in the literature. After all, any contrast can be described as a linear combination of the model coefficients. Thus, since the mathematical details of fitted models (like their matrices of coefficients and covariances) are easily available in \R{}, the values and errors of those contrasts can be calculated without difficulty, and moreover there are contributed functions that facilitate the statistical tests based on such combinations of coefficients, like \code{linearHypothesis} in the package \pkg{car} \cite{Fox-CAR2011}, or \code{glht} in \pkg{multcomp} \cite{multcomp-Hothorn}, which is specially suited for testing main effects. The package \pkg{phia} (Post-Hoc Interaction Analysis) provides a usable interface for calculating different types of contrasts, that are mentioned in the literature related to the analysis of interactions, as well as other combinations of factors that could be of interest for the researcher. The functionality of this package also extends to more complex models, like generalized linear models, mixed-effects models, multivariate linear models for repeated measures designs, and models with covariates. The testing procedures provided by this package are the ones covered by \code{linearHypothesis}, i.e. tests based on \emph{F} and $\chi^{2}$ statistics, with adjusted \emph{p}\nobreakdash-values if needed. The following sections of this paper give a more detailed description of the main types of contrasts that can be used for analysing interactions, and examples of using the functions of this package for calculating and testing them. \section{Mathematical formulation of interactions\label{sec:maths-interactions}} Interactions are often described in terms of a linear, two-way factorial model, where the response variable $Y$ is a function of the factors $A$ and $B$. Interactions are said to exist when a change in the level of one factor has different effects on the response variable, depending on the value of the other factor \cite{dodge_concise_2008}. This is often represented by the following formula: \begin{equation} Y_{ijk}=\mu+\alpha_{i}+\beta_{j}+\left(\alpha\beta\right)_{ij}+\varepsilon_{ijk},\end{equation} where $\alpha_{i}$, $\beta_{j}$ represent the {}``main effect'' of the $i$\nobreakdash-th and $j$\nobreakdash-th levels of $A$ and $B$, respectively, $\left(\alpha\beta\right)_{ij}$ is the effect of the interaction in that combination of levels, and $\varepsilon_{ijk}$ is the error term of the $k$\nobreakdash-th observation in that combination. \R{} analyses such models in the more general framework of linear models, defined by the following matrix equation: \begin{equation} \underset{(n\times m)}{\mathbf{Y}}=\underset{(n\times(r+1))}{\mathbf{X}}\underset{((r+1)\times m)}{\mathbf{B}}+\underset{(n\times m)}{\mathbf{E}},\label{eq:glm-matrix}\end{equation} where $\mathbf{Y}$ contains the $n$ observations of the $m$\nobreakdash-dimensional response variable (with $m$ often equal to 1), $\mathbf{X}$ is the model matrix that only depends on the observed values of the predictor variables and the structure of the model, $\mathbf{B}$ is the coefficient matrix for that model structure and data (with $r$ degrees of freedom --- d.o.f.), and $\mathbf{E}$ contains the error term. The structure of $\mathbf{X}$ and $\mathbf{B}$ is very simple for linear regression models, where all predictors are numeric variables. Let us take a regression model with an univariate response and two regressors ($X_{1}$ and $X_{2}$), including their interaction. In this case \eqref{eq:glm-matrix} would just be the formulation in matrix form of the regression equation: \begin{equation} \left(\begin{array}{c} Y_{1}\\ \vdots\\ Y_{n}\end{array}\right)=\left(\begin{array}{cccc} 1 & X_{11} & X_{21} & X_{11}X_{21}\\ \vdots & \vdots & \vdots & \vdots\\ 1 & X_{1n} & X_{2n} & X_{1n}X_{2n}\end{array}\right)\left(\begin{array}{c} \beta_{0}\\ \beta_{1}\\ \beta_{2}\\ \beta_{12}\end{array}\right)+\left(\begin{array}{c} \varepsilon_{1}\\ \vdots\\ \varepsilon_{n}\end{array}\right)\label{eq:regression-matrix}\end{equation} This model has 3 terms, each with one d.o.f., that are represented by different columns of $\mathbf{X}$ and coefficients of $\mathbf{B}$ (besides the intercept represented by the column of ones and $\beta_{0}$). Two of these terms are the main effects of the regressors, represented by their values in $\mathbf{X}$ and the ``slopes'' $\beta_{1}$, $\beta_{2}$; the other one is the interaction term, represented by the product $X_{1i}X_{2i}$ and the coefficient $\beta_{12}$. For more complex regression models, there may be as many terms as possible products of regressors, such that if there are $k$ regressors, there may be up to $k^{2}-1$ terms. If some or all the predictor variables are factors, the same equations hold, but the representation of the terms in the model matrix would not be scalar values as $X_{1i}$, $X_{2i}$, or $X_{1i}X_{2i}$. The main term of each factor would be represented by a set of ``dummy variables'', whose number would be equal to the d.o.f. of the factor (the number of levels minus 1), and the interactions would be represented by all the possible products of the corresponding dummy variables. Thus, for instance, if there are two factors $A$ and $B$ with $3$ and $4$ levels, respectively, the term of $A$ would be represented by $2$ dummy variables, $B$ by $3$ of them, and their interaction by $2\times3=6$ dummy variables. The problem is that the coefficients that define the interactions in this framework are not always useful for describing the model in practical terms. The products of regressors are usually meaningless variables, and this poses a difficulty in the interpretation of the associated coefficients (see section \ref{sec:covariates} below). This issue is further aggravated when there are factors involved in the interactions, since the meaning of the dummy variables may be even more opaque. That is one of the reasons that motivate the different ways of describing interactions, which are commented on next. \section{Analysis of simple effects in factorial models} \subsection{Calculation and plots of ``cell means''} Let us take, for this and later sections, an example data set based on R.J. Boik's hypothetical data \cite{boik_interactions_1979}, which he used for demonstrating how to analyse interaction contrasts in linear models, although it will be used here for a larger variety of interaction analyses. It represents a hypothetical experiment, where people affected by hemophobia were treated with different fear reduction therapies and different doses of antianxiety medication, in a balanced factorial design, and the effect of these combined treatments was measured by their electrodermal response in an experimental session. First we need to create the linear model from the data. We will use the data frame \code{Boik} also included in the package \pkg{phia}, that has the response \code{edr}, and two factors (\code{therapy}, with levels \code{control}, \code{T1}, and \code{T2}; and \code{medication}, with levels \code{placebo}, \code{D1}, \code{D2}, and \code{D3}). We use the function \code{some} of the package \pkg{car} (imported by \pkg{phia}) to see some cases: <<>>= library(phia) some(Boik) @ Before proceeding with detail analyses of the interactions, we should first check if the factorial model is coherent with the data, and if the interaction between both factors is actually significant. We can do this by examining the residuals of the model (see figure \ref{fig:mod-boik-diagnos}) and the ANOVA table. <>= mod.boik <- lm(edr ~ therapy*medication, data=Boik) par(mfcol=c(1,2)) plot(mod.boik, 1:2) # Plot diagnostics for the model Anova(mod.boik) @ % \begin{figure} \begin{centering} <>= <> @ \par\end{centering} \caption{Residuals vs. fitted values and Q-Q plot of \code{mod.boik}\label{fig:mod-boik-diagnos}} \end{figure} Although the plots of figure \ref{fig:mod-boik-diagnos} show a minor departure of normality for the residuals, specially due to a couple of extremely low observations, for the sake of balance we will keep all data, and assume that the model assumptions hold. Then we see in the ANOVA table that the interaction between \code{therapy} and \code{medication} is significant, so it does makes sense to investigate this effect.% \footnote{The data set is based on the results reported in Boik's paper for the different tests, but not directly copied from his original work (that actually gives no data set). Thus, the residual plots are irrespective of Boik's paper, and due to rounding inaccuracies, the figures presented in this vignette and the ones of Boik's tables differ in the last decimals. Regarding the ANOVA calculations, the \code{Anova} function from the package \pkg{car} has been used to be consistent with later sections, although for this set of balanced data the results would be the same if we had used \code{anova} from the base \R{} package.% } In factorial experiments like this one, the dependency between factor levels and the response variable is usually represented in a contingency table, where the rows and columns are related to the different levels of both treatments, and each cell contains the adjusted mean of the response for the corresponding interaction of factors. When there is an interaction effect, the cell means are taken as the most straightforward way of representing this effect. These values and their standard errors can be obtained from the model coefficients with the function \code{interactionMeans} in the package \pkg{phia}, using the fitted model as first (and in this case only) argument: <<>>= (boik.means <- interactionMeans(mod.boik)) @ This function calculates by default the cell means for the interactions of highest order between factors. To obtain means of lower-order interactions, the optional argument \code{factors} admits a character vector with the names of the factors that are included in the desired interaction.% \footnote{But consider the contradiction of this approach with the marginality principle discussed in the next section.% } If this argument gives only one factor, the result will be the means of its zeroth-order interaction (i.e. the marginal means for that factor). Thus, for instance: <<>>= interactionMeans(mod.boik, factors="therapy") @ The output of \code{interactionMeans} can be plotted via the generic \code{plot} method, that produces the set of plots shown in figure \ref{fig:mod-boik-plot}. The off-diagonal panels are the typical interaction plots, that can also be created by \code{interaction.plot} from the columns of \code{boik.means}, where the lack of parallelism between lines reveals how one factor changes the effect of the other one. In this case, we see that the control group hardly obtains any benefit from the medication, whereas with the other therapies (\code{T1} and \code{T2}) the fear to blood is reduced proportionally to the medication dose, and more markedly for the former. On the other hand, the diagonal panels represent the marginal means of each factor. % \begin{figure} \begin{centering} <>= plot(boik.means) @ \par\end{centering} \caption{Result of \code{plot(boik.means)}\label{fig:mod-boik-plot}} \end{figure} If the interaction involved more than two factors, the graphical device would have as many rows and columns as factors, and the off-diagonal panels would show the first-order interaction means for each pair of factors. For interactions with many factors, the matrix of panels may be cluttered, so it would be more convenient to show them in separate figures. The argument \code{multiple} (\code{TRUE} by default) can be modified for this purpose: <<>>= plot(boik.means, multiple=FALSE) # Not printed in this paper @ \subsection{\emph{Caveat}: low-order interactions and the marginality principle} The basic methods of statistical analysis in \R{} favour the so-called ``marginality principle'', whereby the main effects of factors with non-null interactions should not be interpreted or tested \cite{nelder_reformulation_1977}. Now, the same warning applies to interactions that are themselves contained in interactions of higher order. Thus, although the plots described in the previous section are commonplace in the study of interactions, they are not necessarily meaningful in all circumstances. For instance, since the interaction between \code{therapy} and \code{medication} is significant, according to the marginality principle we should not be concerned with the main effects of those factors, so the diagonal panels of figure \ref{fig:mod-boik-plot} would be irrelevant for this model. Likewise, if a model has more than two factors and an interaction of second or higher order is significant, no plot containing those factors would actually be of interest, since the method \code{plot} on the result of \code{interactionMeans} only represents main effects and first-order interactions. (This is a limitation of the graphic representation alone, since the data frame can represent higher-order interactions). A suitable alternative for representing higher-order interactions are the functions \code{effect} and\\ \code{allEffects} from the package \pkg{effects} \cite{Fox:Hong:2009:JSSOBK:v32i01}. Those functions are specially devised to analyse and plot the interactions of highest order in the model. For factorial models, their outputs contain the same type of data as \code{interactionMeans}, although they handle the numeric predictors of the model in another way, and the types of models that can be analysed is different (see sections \ref{sec:repeated-measures} and \ref{sec:covariates}). \subsection{Testing simple effects} The tabulation or graphical representation of cell means may give us a hint of the underlying structure of interactions, but they do not suffice to verify whether a specific change in the factors plays a significant role in an interaction. As commented on above, the most frequent approach for solving this issue consists in testing the simple effects, as an extension of the \emph{post-hoc} methods that are widely applied to the study of main factor effects. The available methods for the \emph{post-hoc} analysis of main effects are manifold. The most basic procedure consists in evaluating multiple contrasts between factor levels, possibly with corrections of the \emph{p}\nobreakdash-value in order to protect the family-wise error rate. Pairwise comparisons between levels are usually a default strategy when the researcher has no previous plan, although this is inefficient when the factor has many levels. Tukey's method for testing pairwise contrasts, and Scheff\'e's method for all possible contrasts within a factor, are probably the most popular ones. The package \pkg{multcomp} provides useful tools for this kind of main effects contrasts. Testing simple main effects for interactions consists in evaluating contrasts across the levels of one factor, when the values of the other interaction factors are fixed at certain levels. This test is then repeated at other fixed levels, and the results are compared. For instance, we could test the effect of \code{medication} at the different levels of \code{therapy}. This can be done with the function \code{testInteractions}, using the arguments \code{fixed} and \code{across} to define the factors that are fixed and tested across their levels in each test: <<>>= testInteractions(mod.boik, fixed="therapy", across="medication") @ The columns \code{medication1}, $\ldots$ \code{medication3} in the resulting table contain the value of the three orthogonal contrasts across the levels of \code{medication}, for each level of \code{therapy} (the only fixed factor in this example).% \footnote{The specific contrasts that are calculated depend on various elements. In this case, since the original data frame defines \code{medication} as an ordered factor, polynomial contrasts are computed by default. For unordered factors they would have been {}``sum-to-zero contrasts''. This default behaviour can be overriden by setting other contrast in the original data frame, the fitted model, or with additional arguments in \code{testInteractions}.% } The rest of columns show the information of the multivariate test applied to those contrasts. These tests just quantify the qualitative interpretation that was made from the plots: the medication does not have a significant effect for the control therapy group, but its effect is remarkable for the other groups. The criticism often posed to this method is that interactions are mixed with main effects (or lower-order interactions within), so the tests are not really related to the term that is supposedly under investigation. Using this example, the \emph{post-hoc} analysis of the term \code{therapy:medication} is being performed because the ANOVA told us that it is significant; and this means that the coefficients of the matrix $\mathbf{B}$ related to this term are unlikely to be null. However, the tests of simple effects that have just been described do not only involve those coefficients, but also the coefficients related to the lower-order terms \code{therapy} and \code{medication}.% \footnote{\code{interactionTest} does multiple calls to the function \code{testFactors}, which in its turn defines a linear combination of the model coefficients and passes it down to \code{linearHypothesis} from \pkg{car}. The hypothesis matrices used in these tests can be looked at to see what coefficients are actually involved.% } On the other hand, many researchers like simple effects for their relatively straightforward interpretation. Moreover, the interference of lower-order coefficients may be regarded a lesser issue when the marginality principle is considered. In this theoretical framework, the presence of a high-order interaction makes lower order terms meaningless, so that their effects are absorbed by the interaction. Therefore, the coefficients of lower order terms would partially be related to the interaction effect as well. \section{Analysis of residual effects} To address the conceptual problems of simple effects, Rosnow and Rosenthal encouraged the analysis of residual effects, by ``peeling away'' the lower-order effects from cell means \cite{rosenthal_definition_1989}. For instance, let us see the cell means calculated in \code{boik.means}, in a table with the marginal means for both factors and the grand mean: <<>>= boik.mtable <- xtabs(boik.means$"adjusted mean" ~ therapy+medication, boik.means) boik.mtable <- addmargins(boik.mtable, FUN=mean, quiet=TRUE) print(boik.mtable, digits=4) @ The ``corrected means'' would be obtained by subtracting the lowest-order effect (the grand mean) from the rest of values of the table, and then sweeping out the corrected marginal means from the individual cells. <<>>= boik.resid <- boik.mtable - boik.mtable[4,5] # Subtract the mean boik.resid <- sweep(boik.resid, 1, boik.resid[,5]) # Subtract row means boik.resid <- sweep(boik.resid, 2, boik.resid[4,]) # Subtract column means print(boik.resid, digits=4) @ These values can also be calculated (and moreover tested) by \code{testInteractions} via the argument \code{residuals}, instead of \code{fixed} or \code{across}: <<>>= testInteractions(mod.boik,residual=c("therapy","medication")) @ However, these results may cause confusion, and their actual interest may be dubious in many cases, including the analysis of this model. We can plot the corrected means (ommiting the margins of the table) for a clearer inspection of what is happening; see figure \ref{fig:residual-plot}: <>= matplot(t(boik.resid[-4,-5]), type="b", xaxt="n", ylab="Interaction residuals") axis(1, at=1:4, labels=levels(Boik$medication)) @ % \begin{figure} \begin{centering} <>= <> @ \par\end{centering} \caption{Residual effects of \code{mod.boik}\label{fig:residual-plot}} \end{figure} The lines of this plot (representing the three therapies) show the typical symmetry of residual effects. The line labelled with 1 (the control group without specific therapy) shows a negative residual effect of the placebo (a lower electrodermal response), that goes up to positive values as the medication dose increases. The residual effects of the group treated with therapy \code{T1} are the opposite, whereas the the \code{T2} group has a trend similar to the controls, but quite smaller. The first problem is that a hasty interpretation of these values would lead to nonsense. Of course, they do not mean that the hemophobia of people that did not participate in any theraphy worsened with the medication! The correct interpretation is that for these people, the effect of increasing medication was lower than value expected \emph{by the average of marginal means}, and the opposite happened with the group treated with therapy \code{T1}. But this reasoned interpretation, albeit mathematically sound, only has sense as long as the expectations based on the marginal averages mean anything, and this is contrary to the principle of marginality commented on above. Since the factors \code{therapy} and \code{medication} do interact, the marginal means depend on the distribution of the sample across the factors; they are not reliable information about the response that we can expect for the different factor levels, except for a population with the same distribution as the sample (in this case a balanced distribution). See, for instance, Games' and Meyer's discussion on this issue (although they do not explicitly refer to the ``principle of marginality'') \cite{games_type_1973,meyer_misinterpretation_1991}. That principle can be neglected if there is a good reason to study the marginal means that are obtained with a specific experimental design, but these circumstances are rare and special \cite{Venables00exegeseson}. In the current example there is no compelling reason to do so, therefore the significant differences found for the placebo and the highest dose in different therapies are rather uninformative. \section{Interaction contrasts} Another alternative to simple effects is the study of interaction contrasts, which were in fact the subject of the paper where our working data is derived from, although Boik used a slightly different procedure for their analysis. Like in the analysis of interaction residuals, the hypothesis tested by interaction contrasts is not affected by the coefficients of main effects, but this approach overcomes the commented issue of interaction residuals, because it does not make use of marginal means, it only uses the data of the cells \cite{levin_type_1973,meyer_misinterpretation_1991}. Interaction contrasts are defined as ``differential effects'', or more descriptively as ``differences of differences'', or ``contrasts between contrasts''. They basically consist in calculating one or more contrasts across a factor, and then iterating on the results of that operation across the remaining factors. For instance, the test of simple effects previously calculated for \code{mod.boik} could be transformed into a test of interaction contrasts, if instead of fixing the levels of \code{therapy} for evaluating the contrasts across \code{medication}, we do pairwise contrasts between \code{therapy} levels. For this we must use the argument \code{pairwise} instead of \code{fixed}: <<>>= testInteractions(mod.boik, pairwise="therapy", across="medication") @ These tests show how the contrasts across \code{medication} differ between pairs of \code{therapy} groups. We can see that the medication effect with the therapy \code{T1} differs from the effect in controls and for the other therapy; the medication effect with \code{T2} also differs from the effect in controls, but this variation is not so remarkable. This conclusion was not so clear from the simple effects tests. Moreover, these results are not disturbed by the main effects of factors at all, because the calculation of contrasts has removed them for both factors, without having defined them explicitly. The most basic interaction contrasts involve pairwise contrasts for all factors. That is what \code{testInteractions} does by default, when only the model is specified. <<>>= testInteractions(mod.boik) @ If all the factors of the model had 2 levels, this would have been an optimal strategy for analysing the interaction, since the result would have been reduced to one test, corresponding to the single d.o.f. of such an interaction. But the factors with more levels heavily increase the number of tests, so that for our $3\times4$ factorial design, with $2\times3=6$ d.o.f., we obtain 18 overredundant tests. Such a high number of tests is difficult to interpret, let aside the lack of reliability of the \emph{p}\nobreakdash-values (with or without corrections, that can be set by the argument \code{adjustment} in \code{testInteractions}). A more sensible strategy consists in defining a small number of meaningful contrasts that can be of interest for the researcher. For instance, we might be interested in knowing the effect of crossing the following contrasts for each factor \begin{enumerate} \item For \code{therapy}: controls vs. any therapy, and one therapy vs. the other. \item For \code{medication}: placebo vs. any real dose, the minimum dose vs. the maximum, and the medium dose vs. the average of all doses. \end{enumerate} The function \code{testInteractions} also allows to define such custom contrasts, via the argument \code{custom}. This argument must be a named list of matrices, one per factor, with the vectors of coefficients that define the contrasts arranged in columns. The auxiliary function \code{contrastCoefficients} provides a convenient interface to generate that list, from symbolic formulas that represent the contrasts:% \footnote{These matrices are transformed combinations of Helmert and polynomial contrasts, so they could have been defined by the functions \code{contr.helmert} and \code{contr.poly} as well.% } <<>>= (custom.contr <- contrastCoefficients( therapy ~ control - (T1 + T2)/2, # Control vs. both therapies therapy ~ T1 - T2, # Therapy T1 vs. T2 medication ~ placebo - (D1 + D2 + D3)/3, # Placebo vs. all doses medication ~ D1 - D3, # Min. dose vs. max medication ~ D2 - (D1 + D2 + D3)/3, # Med. dose vs. average data=Boik, normalize=TRUE)) # Normalize to homogeinize the scale @ Then use this list to define the contrasts in \code{testInteractions} (after renaming the columns of the matrices for a clearer interpretation of the output): <<>>= colnames(custom.contr$therapy) <- c("cntrl.vs.all", "T1.vs.T2") colnames(custom.contr$medication) <- c("plcb.vs.all", "D1.vs.D3", "D2.vs.avg") testInteractions(mod.boik,custom=custom.contr) @ This table of results is much clearer than the former. Moreover, all these contrasts are orthogonal to each other (none of them can be obtained by combination of the others), so the tests are independent, and the adjustment of \emph{p}\nobreakdash-values is reliable. Taking some care about the meaning of positive and negative figures of the \code{Value} column, we can obtain the following conclusions: \begin{enumerate} \item According to the first two tests, the benefit of taking medication (pooling over the three doses) is greater if the subject also receives some therapy, and this effect is specially marked for therapy \code{T1}. \item And according to the second two tests, the therapies interact in the same manner with the benefit of increasing the medication from the minimum to the maximum. On the other hand, we cannot say that the therapy influences the difference between the medium dose and the average of all doses. \end{enumerate} \section{Multivariate models for repeated-measures\label{sec:repeated-measures}} Repeated-measures experiments are common in many disciplines, including psychology and agriculture, although in the latter they are usually found with the specific strucutre and name of ``split-plot'' designs. The classical approach for analysing this kind of experiments is via multi-strata ANOVA or univariate mixed-effects models, where the subjects or plots are introduced as factors with random effects, added to the error term (see section \ref{sec:mixed-models}). However, when the design is balanced and adequately sized, the multivariate approach is recommended if it possible, since it does not depend on the sphericity assumption and the results are more robust \cite{keselman_testing_1998}. An example of such an analysis in \R{} is published in the web appendices to Fox's and Weisberg's \emph{R Companion to Applied Regression} \cite{Fox-appendix-mlm-car}. We will use that example, based on the \code{OBrienKaiser} data set from \pkg{car} \cite{obrien_manova_1985}. <<>>= some(OBrienKaiser, 6) @ That data set has 16 rows with observations of people classified by two between-subjects factor (\code{gender}, with levels \code{F}, \code{M}; and \code{treatment}, with levels \code{control}, \code{A}, and \code{B}), so that each subject has 15 measures distributed in columns. These 15 columns correspond to 5 consecutive observations in 3 different phases (pre-test, post-test, and follow-up); this within-subjects model may be coded in a data frame with two crossed factors: <<>>= (idata <- expand.grid(hour=ordered(1:5), phase=c("pre", "post","fup"))) @ The between-subjects factor \code{treatment}, however, is not balanced in this case, as can be seen in the following frequency table: <<>>= addmargins(table(OBrienKaiser[c("gender","treatment")])) @ We skip the exploration of the data that is already done in \cite{Fox-appendix-mlm-car}, and proceed to defining the multivariate model. <<>>= mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5, post.1, post.2, post.3, post.4, post.5, fup.1, fup.2, fup.3, fup.4, fup.5) ~ treatment*gender, data=OBrienKaiser) @ The multivariate ANOVA with response transformation for repeated measures may be done with the function \code{Anova} in \pkg{car}, using the auxiliary data frame \code{idata}, and the formula \code{idesign} with the within-subjects design. For the sake of coherence with the published example, we report a type\nobreakdash-III test. <<>>= Anova(mod.ok, idata=idata, idesign=~phase*hour, type=3) @ Besides the intercept, the only significant effects at $\alpha=0.05$ are the main effects of \code{phase} and \code{hour}. Nevertheless, let us suppose that we have reasons to be more liberal, and want to investigate the interaction \code{treatment:phase} that is near the $\alpha$ level of significance. (The main effect \code{treatment} is also near that level, but we may ignore it since we will focus on its interaction with another factor). The operations previously presented for univariate linear models can also be used in this case, as convenient wrappers of the procedures recommended for the \emph{post-hoc} analysis of multivariate models \cite{R-help:141760,R-help:235586}. First we may explore and plot the cell means of this interaction with \code{interactionMeans}, using the auxiliary data frame \code{idata} to specify the within-subjects model (\code{idesign} is not needed). We will use the argument \code{errorbar} to draw the asymptotic $95\%$ confidence intervals of the adjusted means, instead of their standard errors.% \footnote{These confidence intervals are not exact, but an approximation assuming that the parameters of the model are random results, normally distributed around their ``true'' values. This assumption is met asymptotically, if the samples of data are large enough.% } The figure in the string \code{"ci95"} might be changed to calculate other confidence intervals, like \code{"ci90"}, \code{"ci99"} for $90\%$, $99\%$, etc. % <>= ok.means <- interactionMeans(mod.ok, c("treatment","phase"), idata=idata) plot(ok.means, errorbar="ci95") @ \begin{figure} \begin{centering} <>= <> @ \par\end{centering} \caption{Means of the \code{treatment:phase} interaction for the O'Brien and Kaiser model\label{fig:ok-treatmentxphase}} \end{figure} The plot of figure \ref{fig:ok-treatmentxphase} shows that in the post-test and follow-up phases, the response of the control group more or less remains at the same level as in the pre-test phase, whereas the response for the other treatments increases. However, the confidence intervals are relatively large, compared with the variations between adjusted means, and there is a lot of overlap. An analysis of all the possible interaction pairwise contratsts between \code{treatment} and \code{phase} help us tell what differences are really significant: <<>>= testInteractions(mod.ok, pairwise=c("treatment", "phase"), idata=idata) @ Although the interaction contrasts result in one-dimensional values, we have trasformated a multivariate response to obtain them, so the rows of this table report Pillai tests of MANOVA. These tests show that the only significant contrast is between the pre-test and follow-up phases, when compared between the control group and treatment \code{B}. Given the similarity betwen the means of treatments \code{A} and \code{B}, we could have expected a significant difference between controls and treatment \code{A} as well, but the test does not reject the null hypothesis in that case, because the number of observations for treatment \code{A} is lower, and therefore the confidence intervals for the estimations are greater. The main effect of \code{hour} could be analysed similarly. And there are other possible ways of analysing this interaction, using other options of \code{testInteractions}, or by other methods as proposed by Keselman \cite{keselman_testing_1998}. The reader is encouraged to try these alternatives. \section{Linear models with covariates\label{sec:covariates}} Data sets may include numeric predictors, as mentioned in section \ref{sec:maths-interactions}. When combined with factors, they are called \emph{covariates} and ANOVA is called ANCOVA (Analysis of Covariance). The analysis of interactions that involve these variables is also different. A factorial model has a finite number of factor combinations where the adjusted mean of the response can be evaluated, but the possible values of a covariate are infinite. Therefore, the effects of covariates are usually represented as continuous functions within the range of their observed values (the model may allow the calculation of effects beyond that range, but such predictions would normally have little reliability). % For instance, the functions \code{effect} and \code{allEffects} from the % package \pkg{effects} evaluate the adjusted mean of the response % at a sample of values of the covariates, either automatically calculated % by the function or specified by the user. We have seen that in factorial models, the effects of factors may be analysed by the contrasts between their levels. Obviously, the number of ``contrasts'' between possible values of a covariate would be infinite, although they are constrained by the d.o.f. of the model. Let us take a pure linear regression model without factors, and two covariates that do not interact: \begin{equation} Y_{i}=\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+\varepsilon_{i}\end{equation} For the effect of $X_{1}$, there are infinite pairs of values $X_{1a}\neq X_{1b}$ at which we could estimate such ``contrasts'', but the expected value of the result would always be proportional to the difference between $X_{1a}$ and $X_{1b}$. The ratio between both differences would be equal to the derivative of $E\left(Y\right)$ with respect to $X_{1}$, which is equal to the model coefficient for $X_{1}$: \begin{equation} \frac{\Delta E\left(Y\right)}{\Delta X_{1}}=\frac{\partial E\left(Y\right)}{\partial X_{1}}=\beta_{1}\end{equation} Thus, when covariates do not interact, their effects can just be described by the values of their corresponding model coefficients. They are a measure of the ``slope'' along the covariate, or the increment in the expected value of the response, when the covariate increases in one unit. Thus, if the researcher also wants the adjusted value of the response for different values of the covariate, % as given by the functions of \pkg{effects}, the only additional information that he or she needs is the adjusted mean at an arbitrary point. The functions of the package \pkg{phia} can report the values of those slopes. Let us take the model for the prestige of Canadian occupations, defined in \cite{Fox-CAR2011}, p. 165. That model uses the data set \code{Prestige} from \pkg{car}, that contains several variables related to 102 different occupations: <<>>= str(Prestige) @ In this model, the prestige score of each profession (\code{prestige}) is fitted to a linear model that depends on \code{education} (average years of education), \code{income} (average income), and the factor \code{type}, that has three levels: \code{bc} (blue collar), \code{prof} (professional), and \code{wc} (white collar). The two former variables are continuous covariates (\code{income} transformed to logarithmic scale to improve the normality of the residuals), with different responses for the three types of ocupation (an interaction with \code{type}). We fit this model and do an ANOVA of it: <<>>= mod.prestige <- lm(prestige ~ (log2(income)+education)*type, data=Prestige) Anova(mod.prestige) @ This analysis detects significant main effects of all the predictors, plus a significant interaction between \code{log2(income)} and the factor \code{type}. This interaction can explored with \code{interactionMeans} (see figure \ref{fig:interaction-covariates}) and tested with \code{testInteractions}, by just giving the name of the relevant covariate(s) in the argument \code{slope}. We also tell the name of the factor plotted at the \emph{X}\nobreakdash-axis and for which the pairwise contrasts are calculated (\code{type}), although it might be ommited in this case because there is no other factor in the model. <>= plot(interactionMeans(mod.prestige, atx="type", slope="log2(income)")) testInteractions(mod.prestige, pairwise="type", slope="log2(income)") @ % \begin{figure} \begin{centering} <>= <> @ \par\end{centering} \caption{Plot of the interaction \code{log2(income):type\label{fig:interaction-covariates}}} \end{figure} The plot and ANOVA table show the adjusted values of the slope with respect to \code{log2(income)} (instead of the adjusted mean of the response), for the levels and contrasts of the factor \code{type}. That slope, i.e. the proportional relation between the occupation's income and its prestige, is greater for blue collar occupations, whereas the difference is smaller between the other two types. However, the tests only reveal significant differences between blue collar and professionals, due to the unbalancedness of data. We can see in the ANOVA table that although the the adjusted slope of \code{bc-prof} is relatively similar to \code{bc-wc}, the sums of squares are far greater in the former case, since there are many more cases of \code{prof}: <<>>= table(Prestige$type) # Frequencies of occupation types @ If there had been a significant interaction between the covariate \code{education} and \code{type}, we could have analysed it independently, since both covariates have an additive effect (they do not interact). Things may become more complicated if there are interactions between covariates. In that case, the slopes are not constant for given combinations of factors, but they are a function of the interacting covariates. For instance, in a regression with two interacting variables: \begin{equation} Y_{i}=\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+\beta_{12}X_{1}X_{2}+\varepsilon_{i},\label{eq:model-covariates}\end{equation} the slope with respect to $X_{1}$ is $\left(\beta_{1}+\beta_{12}X_{2}\right)$, and the slope for $X_{2}$ is $\left(\beta_{2}+\beta_{12}X_{1}\right)$. This means that the results of previous calculations may depend on the values of the other covariates. For instance, let us take another model for the \code{Prestige} data, where we consider that \code{log2(income)} and \code{education} can interact (the influence of the occupation's income may vary with the average years of education), and we discard the data of white collar occupations to have a more balanced design and simpify things. <<>>= mod.prestige2 <- update(mod.prestige, formula=.~.+(log2(income):education)*type, subset=(Prestige$type!="wc")) Anova(mod.prestige2) @ The significant effects shown by the ANOVA table are the same as for the previous model, including the interaction \code{log2(income):type}. And since \code{type} now has only two levels, the post-hoc test of that interaction gives virtually the same result as that table: <<>>= testInteractions(mod.prestige2, pairwise="type", slope="log2(income)") @ However, the algorithm used by this test (the function \code{testFactors}) sets the covariates at definite values, and the interaction between covariates considered by the model makes the test sensitive to those values. The default values of the covariates are the averages of the cases observed in the model fit. Nevertheless, the user may choose other arbitrary values, using the extra argument \code{covariates} that is passed down to \code{testFactors}. This argument may contain a named numeric vector with the custom values of any covariate. For instance, we might want to set \code{education} at its 75th quantile, and see what happens with the interaction \code{log2(income):type}. <<>>= # Look quantiles of the model frame (a subset of the original data) quantile(model.frame(mod.prestige2)$education) testInteractions(mod.prestige2, pairwise="type", slope="log2(income)", covariates=c(education=14)) @ Altering the value of \code{education} has substantially changed the result of the test on a term that contains \code{log2(income)}, because of the interaction between both covariates. See, however, that this interaction, and other terms that contain it, are not significant according to the ANOVA of the model. So it could be wise to simplify the model (turning back to the original one), and remove this needlessly problematic interaction. Now, if an interaction between covariates were really significant, the interest of the researcher should focus on it. The high-order interactions in linear models have a constant effect at any combination of the covariates, so the problem of the arbitrary values used by the tests disappears. If the argument \code{slope} of \code{interactionMeans} or \code{testInteractions} has the names of two or more covariates of the model, the calculations will be done on the values of the coefficients related to those interactions (and higher-order terms that might contain it). As commented on at the beginning of this document, these coefficients may be more difficult to understand, since the {}``slope'' along a product of covariates may seem meaningless, but it can just be interpreted as an extension of the analogy previously used between factor contrasts and derivatives with respect to covariates. We have already seen that when two factors interact, their effect can be evaluated by means of {}``contrasts between contrasts''. Likewise, when two covariates interact, we can study a {}``derivative of the derivative'', i.e. the second-order partial derivative of the response, which is represented by the interaction coefficient. For the regression model of equation \eqref{eq:model-covariates}: \begin{equation} \frac{\partial^{2}E\left(Y\right)}{\partial X_{1}\partial X_{2}}=\beta_{12}\end{equation} This interpretation can be extended to third-order or higher interactions between covariates as well, although models with that complexity are rarer, and the very meaning of such interactions will probably more difficult to interpret than the coefficients used for representing them. \section{Generalized linear models\label{sec:glm}} Generalized linear models (GLM) are very much like classical (Gaussian) linear models in most aspects, let aside the distribution of the error term and the expected value of the response, $\mu=E\left(Y\right)$, that is related to the linear predictor by means of a \emph{link function} $\eta\left(\mu\right)$. Accordingly, the interactions of such models can be analysed by the methods explained in previous sections, although the interpretaton of adjusted values in cell means and contrasts may be a bit more involved. Let us take the example of the AMS survey about PhDs in mathematical sciences. This example uses the data set \code{AMSsurvey} of the package \pkg{car}, that contains a cross-classification of all the PhDs awarded in the mathematical sciences for different periods in US, assigned to 24 different categories depending on various characteristics of the doctorate students \cite{Phipps2010}. <<>>= str(AMSsurvey) @ The data are structured in a data frame, with one row per category. The variable \code{counts} tells the number of PhDs of each category in 2008-2009, and the previous variables are factors that identify the category.% \footnote{\code{AMSsurvey} also contains data of other periods, but we will focus on the period 2008-2009. The analysis will be based on a parsimonious model for those data, which is already fitted in \cite[pp. 253--255]{Fox-CAR2011}.% } One of these factors is \code{type}, the type of institution the doctorate was affiliated with, and has 6 levels: \code{I(Pr)}, \code{I(Pu)}, \code{II}, \code{III}, \code{IV}, and \code{Va};\code{ I} to \code{III} are math departments in universities of progressively lower-quality --- with \emph{pr}ivate and \emph{pu}blic institutions distinguished by the parenthetical abbreviations, \code{IV} are statistics and biostatistics departments, and \code{Va} are applied mathematics departments. The other factors are \code{sex} (the gender of the doctorate, a factor with levels \code{Female} and \code{Male}), and \code{citizen} (the citizenship status, a factor with levels \code{Non-US} and \code{US}). The contents of the data frame are clearer if shown as a contingency table: <<>>= ftable(xtabs(count ~ sex + citizen + type, AMSsurvey)) @ A saturated model of these data would represent \code{count} as a function of the three-way interaction of \code{type}, \code{sex}, and \code{citizen}. But the influences of \code{sex} and \code{citizen} are independent to each other, so we can use a simplified model \cite{Fox-CAR2011}. Since we are dealing with count data, the appropriate family would be a Poisson distribution. All in all, the model could be defined as follows: <<>>= mod.ams <- glm(count ~ type*(sex+citizen), family=poisson, data=AMSsurvey) @ The ANOVA of this model confirms that the high-order interaction terms that remain are significant: <<>>= Anova(mod.ams) @ The terms of interest are the interactions of \code{type} with \code{sex} and \code{citizen}. We may have a look on the adjusted means with \code{interactionMeans}, as in other models. We tell that we want \code{type} in the \emph{X}\nobreakdash-axis, and the other factors coded as lines, with the arguments \code{atx} and \code{traces}, respectively: <>= ams.means <- interactionMeans(mod.ams) plot(ams.means, atx="type", traces=c("sex","citizen")) @ % \begin{figure} \begin{centering} <>= <> @ \par\end{centering} \caption{Adjusted (geometric) means of the interactions in \code{mod.ams}\label{fig:plot-ams}} \end{figure} The resulting plots (see figure \ref{fig:plot-ams}) are very similar to the ones shown in previous section, with the exception of the \emph{Y}\nobreakdash-axis scale, which is nonlinear in this case. The reason is that for GLM, \code{interactionMeans} does not average over values of the response variable (the counts of awarded PhDs); the calculations are actually done in the link function domain, and the plots are drawn in its scale, although the resulting averages and the \emph{Y}\nobreakdash-axis labels are eventually transformed to show values of the response. In this case the link function is $\eta=\log\left(\mu\right)$; therefore, the adjusted means are \emph{geometric} (not arithmetic) means of the response variable,% \footnote{The adjusted link function \emph{is} an arithmetic mean, but then: $\mu\left[\sum\left(\eta_{i}\right)/n\right]=\exp\left[\sum\left(\log\mu_{i}\right)/n\right]=\prod\mu_{i}^{1/n}$.% } and the \emph{Y}\nobreakdash-axis is plotted in a logarithmic scale. Nevertheless, the interpretation of the plot is not very affected by this issue. The mean number of male doctorates is higher in all institutions, but the difference seems larger in ``first-class'' universities (both public and private), and smaller in `third-class'' universities or statistics departments (group \code{IV}). On the other hand, the influence of US-citizenship seems negligible except for statistics departments, where there are more foreign doctorates. The tables of \code{testInteractions} just confirm that these differences are significant: <<>>= testInteractions(mod.ams, pairwise=c("type","sex")) # test type:sex testInteractions(mod.ams, pairwise=c("type","citizen")) #test type:citizen @ Notice, however, that the interpretation of the column \code{Value} in these tables requires considering the relation between the link function and the response variable. We have combined pairwise contrasts, that result in differences in the domain of the link function; for instance if we focus on the interaction contrast \code{I(Pu)-II : Female-Male}, the adjusted value of the link is: \begin{equation} \left(\eta_{I(Pu),F}-\eta_{I(Pu)M}\right)-\left(\eta_{II,F}-\eta_{II,M}\right)\end{equation} But $\eta_{i,j}=\log\left(\mu_{i,j}\right)$, therefore the previous equation is equivalent to: \begin{equation} \log\left(\frac{\mu_{I(Pu),F}}{\mu_{I(Pu),M}}\div\frac{\mu_{II,F}}{\mu_{II,M}}\right)\end{equation} And if we go back to the response variable domain, the logarithm is cancelled, so we get that the interaction contrasts is, in this case, a ``ratio of ratios'', rather than a ``difference of differences''. In other words, the figure \code{0.47} in the column \code{Value} for the mentioned interaction contrast means that the proportion of females vs. males in {}``first-class'' public universities is $0.47$ times (less than half) the proportion in {}``second-class'' universities. For other families of GLM, the interpretation of the contrasts in the domain of the response variable may be more obscure; so if desired, the argument \code{link=TRUE} may be used in all the functions of \pkg{phia} to force the representation of the adjusted values in the domain of the link function. GLM may include covariates, and the analysis of interactions may involve terms that contain them. In that case, the means and contrasts reported by \code{interactionMeans} or \code{testInteractions} are always in the link function domain, since the response variable keeps no linear relation with the predictors at all, and therefore there is no such thing as {}``slopes'' of that variable, except in very local, differential environments. Anyway, if there is some reason to do it, it is possible to obtain the derivatives of the response variable with respect to the covariates, via the {}``chain rule''. The slopes reported by the functions of \pkg{phia} are derivatives of the link function, $\partial\eta/\partial X_{i}$, and if \code{model} is the name of the object with the fitted GLM, we can obtain the derivative of the expected response with respect to the link, $d\mu/d\eta$, as a function: <>= dm.de <- family(model)$mu.eta @ Now, to get the derivative of $\mu$ with respect to the covariate $X_{i}$ at a specific value $X$, we just have to evaluate \code{de.dm} at that value --- type \code{eval(dm.de(X))}, with the desired value bound to \code{X} ---, and then mutliply: \begin{equation} \frac{\partial\mu}{\partial X_{i}}\left(X\right)=\frac{\partial\eta}{\partial X_{i}}\frac{d\mu}{d\eta}\left(X\right)\end{equation} More generally, all the calculations that may be done with classical linear models can also be applied to GLM, but it is necessary to consider that the model is defined in terms of the link function, whereas the outcomes are usually reported in units the response variable, and the calculations must be defined according to the domain where they operate. \section{Mixed-effects models \label{sec:mixed-models}} \subsection{Differences with fixed-effects models} Linear and generalized linear models assume error independency between observations, i.e. that there is no correlation between the values of the residual error. However, this assumption may be violated in many experiments, where the observations are clustered in groups associated to noise factors, e.g. sets of measures taken in different environments, with different instruments, or in different subjects. Such factors are normally uninteresting for the purposes of the study, but their influence may nontheless be significant. This conflict is solved by including the influence of those factors in the model, so that \eqref{eq:glm-matrix} is expanded as follows (simplifying the model to an univariate response): \begin{equation} \underset{(n\times{}1)}{\mathbf{Y}}=\underset{(n\times{}(r+1))}{\mathbf{X}}\underset{((r+1)\times{}1)}{\mathbf{\beta}}+\underset{(n\times{}s)}{\mathbf{Z}}\underset{(s\times{}1)}{\mathbf{u}}+\underset{(n\times{}1)}{\mathbf{\varepsilon}},\label{eq:mixed-matrix}\end{equation} where the additional coefficients $\mathbf{u}$ and their corresponding model matrix $\mathbf{Z}$ are related to the $s$ groups of observations associated to the noise factors (plus their possible interctions with other factors and covariates). Now, if it is possible to assume that the effects of the noise factors (i.e. the coefficients represented in $\mathbf{u}$) are random values from a normal distribution, the model may be simplified, so that instead of fitting the individual values of $\mathbf{u}$, it is only necessary to fit their covariance matrix $\mathbf{\Psi}$, that is usually constrained as a function of one or a few parameters. This is what is known as a \emph{mixed-effects model} (with a combination of both ``fixed'' and ``random'' effects).% \footnote{The structure of the random part of the model is similar to the error term. In fact, in the limit case where all the groups of random factors only contain one observation, $\mathbf{Z}$ is equivalent to the identity matrix, and the coefficients $\mathbf{u}$ cannot be distinguished from the residual errors $\mathbf{\varepsilon}$. Obviously, in that case no special technique would be necessary for fitting the model, and the linear regression method would just do the work (although with an increased residual error). It is also possible to work with \emph{generalized mixed-effects models}, where both the coefficients of random effects and the residual error follow a non-normal distribution.% } These models may be analysed in \R{} with the functions included in the packages \pkg{nlme} \cite{Pinheiro-nlme} and \pkg{lme4} \cite{Bates-lme4}. In the following examples we will use the latter, which is more flexible and efficient, although the procedure for analysing interactions is the same in both cases. The functions of \pkg{phia} for the post-hoc analysis of mixed-effects models are used just like in the case of linear and generalized linear models. Multivariate responses in mixed-effects models are not supported, however, although in some cases this may be worked around by a modification of the data structure, as will be shown in the examples. From a theoretical point of view, another difference is that the tests are not exact, so there may be concerns about the reliability of the reported \emph{p}\nobreakdash-values, as will be discussed in the examples as well. Note, moreover, that the analysis is limited to the fixed effects, for good reasons. Random effects are normally noise that must be controlled, but are not interesting for the conclusions of the study, and that is in fact why their effect is generally simplified as a set of random values. Accordingly, the parameters that are fitted in the model are not the specific values of the random effects, but their covariances or other parameters that define their distribution. In other words: the model does not really tell anything of those specific values, so we should not interrogate it about them. Although the level of abstraction needed to understand this principle may be high in a complex model, it may be better grasped in a trivial model, defined just as the average of a set of random values, plus the variance of the residual error. Doing a post-hoc about a random effect in a mixed-effects model is like asking in this case if two arbitrary observations are too far apart. This may be a legitimate diagnostic question at best, but a positive answer would just mean that the chosen model is inadequate for those data. \subsection{Fitting and analysing a mixed-effects model} As an example to illustrate the post-hoc analysis of mixed-effects models, we will use Snijders' and Bosker's data with language achievements of 2287 high-school students \cite{Snijders1999}, included in the package \pkg{nlme}. Following one of the examples worked by Snijders and Bosker, we will study the scores of a language test as a regression of the students' IQ in verbal tasks and their socio-economical status (SES), possibly influenced by their gender, and the average SES and IQ of their schoolmates. In addition, we will explore the strategy to analyse repeated-measures with a mixed-effects model, by comparing the achievements in two repetitions of the language test: at the start and end of the school year. All in all, the variables of interest in the original data frame \code{bdf} are: \begin{description} \item[\code{langPRET}:] Score in the first repetition of the test. \item[\code{langPRET}:] Score in the second repetition of the test. \item[\code{pupilNR}:] Student code. \item[\code{IQ.ver.cen}:] Student IQ (school-centered). \item[\code{ses}:] Student socio-economical status (SES). \item[\code{sex}:] Student gender. \item[\code{schoolNR}:] School code. \item[\code{schoolSES}:] School average SES \item[\code{avg.IQ.ver.cen}:] School average IQ. \end{description} We first create a subset of the data frame with those variables, taken from the namespace of \pkg{nlme} without attaching it (with the operator \code{::}).% \footnote{ Since the namespaces of \pkg{nlme} and \pkg{lme4} partially overlap, it is not advisable to load both packages in the same session.% } We will also add meaningful labels to the \code{sex} factor, and simplify some variable names for the sake of clarity. <<>>= Snijders <- nlme::bdf[c("langPRET","langPOST", # Outcomes "pupilNR", "IQ.ver.cen", "ses", "sex", # Student-related variables "schoolNR", "schoolSES", "avg.IQ.ver.cen")] # School-related variables Snijders$sex <- factor(Snijders$sex, labels=c("F","M")) names(Snijders) <- c("score.1","score.2","student","IQ","SES","sex","school","avgSES","avgIQ") @ Following Snijders and Bosker, for a better model fit we will also include quadratic components of IQ around $0$ (remember that the \code{IQ} variable is centered) \cite[p.~113]{Snijders1999}. <<>>= Snijders$IQ2.pos <- with(Snijders, (IQ > 0)*IQ^2) Snijders$IQ2.neg <- with(Snijders, (IQ < 0)*IQ^2) @ The model proposed by Snijders and Bosker was focused on the result of one of the tests (say \code{score.2}, and contained the following fixed terms: the crossed linear effect of IQ and SES, both at student level (\code{IQ*ses}) and at school level (\code{avgIQ*avgSES}), the quadratic effects of IQ (\code{IQ2.pos}, \code{IQ2.neg}), and the effect of gender (\code{sex}). They also considered that the average scores and the linear effect of IQ could be grouped by school. Such a model could be fitted as follows: <>= library(lme4) @ <>= form1 <- score.2 ~ IQ * SES + IQ2.pos + IQ2.neg + sex + avgIQ * avgSES + # Fixed part (IQ | school) # Random part mod.snijders.1 <- lmer(form1, data=Snijders) @ Now, this is one of the cases where we cannot use the multivariate approach described in section \ref{sec:repeated-measures} for analysing the effect of repeating the test. We could attempt a similar analysis step-by-step, fitting two models with different dependent variables, corresponding to the response transformations that are used in the analysis of within-subjects effects. Both models would have the same right-hand side of the formula as \code{form1} above. The dependent variable in one of them would be the average of \code{score.1} and \code{score.2}; in the other one it would be their difference: <>= form2.1 <- update(form1, (score.1+score.2)/2~.) form2.2 <- update(form1, (score.2-score.1)~.) @ If we consider that \code{score.1} and \code{score.2} are two subsets of the same variable, distinguished by a two-level factor (say \code{repetition}), the model fitted with \code{form2.1} would tell us the effects of the terms defined in that formula on the pooled variable, whereas the model fitted with \code{form2.2} would define the effects of the interaction between \code{repetition} and those terms (i.e. \code{repetition:IQ}, \code{repetition:SES}, etc.). On the other hand, we may fit a single model that accounts for all the terms that we want to analyse with more flexibility. For that, we first have to transform the ``wide'' data frame with \code{score.1} and \code{score.2} as different columns, into a ``long'' data frame with both of them in one column. We will also need the \code{repetition} factor explicitly defined in the data frame, and a variable that identifies each original row (the already existing variable \code{student} in the original frame can do that work). We can get all this with the \code{reshape} function: <<>>= Snijders.long <- reshape(Snijders, direction="long", idvar="student", varying=list(c("score.1","score.2")), v.names="score", timevar="repetition") # The within-subjects factor must be coded as a factor Snijders.long$repetition <- as.factor(Snijders.long$repetition) # See the variables of the long data frame str(Snijders.long) @ Now we can use this data frame to fit a model based on the formula \code{form1}, adding the interactions of \code{repetition} with the terms we want (in the multivariate approach, all the interactions of between-subjects and within-subjects terms are always included). The price that is paid for that is the addition of the random effects of \code{student}, since now we have more than one observation for each value of this factor. We will consider that the repetition only interacts with the student-related terms. Regarding the random effects, the \code{student} factor can only influence the \emph{intercept} term, since all the other terms of the model are constant for the two measures of each student. Thus, the new formula and the model fitted with it will be: <<>>= form3 <- score ~ repetition * (IQ * SES + IQ2.pos + IQ2.neg + sex) + # Student-related avgIQ * avgSES + # School-related (IQ | school) + (1 | student) # Random part mod.snijders.3 <- lmer(form3, data=Snijders.long) # See the main parameters of the model (ommit correlations table) print(mod.snijders.3, correlation=FALSE) @ As said above, the methods for analysing a model like this are similar to the ones used in fixed-effects models. However, there are important \emph{caveats} about the results that must be considered. The main objective of many researchers in their using statistical techniques for data analysis is to obtain \emph{p}\nobreakdash-values to accept or reject their hypothesis, but the calculation of those values in mixed-effects models is problematic. That is the reason why the table of coefficients of a model fitted by \code{lmer} does not show \emph{p}\nobreakdash-values, as may be seen above, nor are they presented in the ANOVA table given by the standard \code{anova} method \cite{R-help:094765}.% \footnote{Mixed models fitted with \code{lme} from the package \code{nlme} do show \emph{p}\nobreakdash-values, but this does not mean that they should always be trusted.% } There are different techniques for working around this problem, like the Kenward-Roger approximation or parametric bootstrap, which are implemented in the package \pkg{pbkrtest} \cite{Halekoh-pbkrtest}. An alternative approach is assuming that the covariances of the random part of the model are equal to the estimated values, and then use standard tests \cite{Faraway2006}. The function \code{Anova} from \pkg{car} produces tables with \emph{p}\nobreakdash-values based on Wald tests, following the latter assumption \cite{R-sig-ME:014095}, and by extension, the same happens with the test functions of the package \pkg{phia}, which are also dependent on the function \code{linearHypohtesis} from \pkg{car}. This being said, we can first look what terms are significant according to the Wald tests: <<>>= Anova(mod.snijders.3) @ The ANOVA table shows that most terms of our model are significant for the Wald test at $\alpha{}=0.05$. The exceptions are the terms related to \code{avgSES}, and the second-order interaction of \code{repetition} with \code{IQ} and \code{SES}. All the remaining main and interaction effects of factors and covariates could further be investigated, as explained in the previous sections. Just as an example, let us focus on the only interaction between factors: \code{repetition:sex}. We can see the table of average scores, and calculate the simple main effects and pairwise interactions: <<>>= # Cell means interactionMeans(mod.snijders.3) # Simple effect of sex at each repetition testInteractions(mod.snijders.3, fixed="repetition", across="sex") # Pairwise interactions (default) testInteractions(mod.snijders.3) @ These results show that female students had significantly lower scores than boys, and that in the second repetition that difference was two times the initial one. Note, however, that the size of the difference is small in spite of the statistical significance (gender differences were between 1 and 3 points, for scores that were normally over 30 points). That is something that often happens when models are fitted to very large data bases, and that reminds us that \emph{p}\nobreakdash-values must always be taken carefully and with a critical thought. In this case, where we used thousands of observations, we may suspect that many of the effects with low \emph{p}\nobreakdash-values in the ANOVA table may be irrelevant. In fact, if we look at the coefficients of the model (in the table above) and the ranges of the associated covariates in the data frame, we can see that most effects are negligible in size when compared with the differences associated to \code{repetition} and the linear effect of \code{IQ}. This means that they are not relevant from a practical point of view, even though they may contribute to a significantly better fit of the model. \section*{Acknowledgements} I thank Professors John Fox and Michael Friendly, for their encouragement and advise in the creation of the functions of \pkg{phia}. \bibliographystyle{ieeetr} \bibliography{bibliography,bibextra} \end{document} <<>>= graphics.off() @