\documentclass[nojss]{jss}
\usepackage{amsfonts,bm,amsmath,amssymb}

%%\usepackage{Sweave} %% already provided by jss.cls
%%%\VignetteIndexEntry{Finite Mixture Model Diagnostics Using Resampling Methods}
%%\VignetteDepends{flexmix}
%%\VignetteKeywords{R, finite mixture model, resampling, bootstrap}
%%\VignettePackage{flexmix}

\title{Finite Mixture Model Diagnostics Using Resampling Methods}
<<echo=false, hide=true>>=
options(useFancyQuotes = FALSE)
digits <- 3
Colors <- c("#E495A5", "#39BEB1")
critical_values <- function(n, p = "0.95") {
  data("qDiptab", package = "diptest")
  if (n %in% rownames(qDiptab)) {
    return(qDiptab[as.character(n), p])
  }else {
    n.approx <- as.numeric(rownames(qDiptab)[which.min(abs(n - as.numeric(rownames(qDiptab))))])
    return(sqrt(n.approx)/sqrt(n) * qDiptab[as.character(n.approx), p])
  }
}
library("graphics")
library("flexmix")
combine <- function(x, sep, width) {
    cs <- cumsum(nchar(x))
      remaining <- if (any(cs[-1] > width)) combine(x[c(FALSE, cs[-1] > width)], sep, width)
      c(paste(x[c(TRUE, cs[-1] <= width)], collapse= sep), remaining)
}
prettyPrint <- function(x, sep = " ", linebreak = "\n\t", width = getOption("width")) {
  x <- strsplit(x, sep)[[1]]
  paste(combine(x, sep, width), collapse = paste(sep, linebreak, collapse = ""))
}
@ 
\author{Bettina Gr{\"u}n\\
  Wirtschaftsuniversit{\"a}t Wien \And
  Friedrich Leisch\\
  Universit\"at f\"ur Bodenkultur Wien} 

\Plainauthor{Bettina Gr{\"u}n, Friedrich Leisch}

\Address{
  Bettina Gr\"un\\
  Institute for Statistics and Mathematics\\
  Wirtschaftsuniversit{\"a}t Wien\\
  Welthandelsplatz 1\\
  1020 Wien, Austria\\
  E-mail: \email{Bettina.Gruen@R-project.org}\\

  Friedrich Leisch\\
  Institut f\"ur Angewandte Statistik und EDV\\
  Universit\"at f\"ur Bodenkultur Wien\\
  Peter Jordan Stra\ss{}e 82\\
  1190 Wien, Austria\\
  E-mail: \email{Friedrich.Leisch@boku.ac.at}
}

\Abstract{ 
  
  This paper illustrates the implementation of resampling methods in
  \pkg{flexmix} as well as the application of resampling methods for
  model diagnostics of fitted finite mixture models. Convenience
  functions to perform these methods are available in package
  \pkg{flexmix}. The use of the methods is illustrated with an
  artificial example and the \code{seizure} data set.

}

\Keywords{\proglang{R}, finite mixture models, resampling, bootstrap}
\Plainkeywords{R, finite mixture models, resampling, bootstrap}
%%-------------------------------------------------------------------------
%%-------------------------------------------------------------------------
\begin{document}
\SweaveOpts{engine=R, echo=true, height=5, width=8, eps=FALSE, keep.source=TRUE}
\setkeys{Gin}{width=0.95\textwidth}

\section{Implementation of resampling methods}\label{sec:implementation}

The proposed framework for model diagnostics using resampling
\citep{mixtures:gruen+leisch:2004} equally allows to investigate model
fit for all kinds of mixture models. The procedure is applicable to
mixture models with different component specific models and does not
impose any limitation such as for example on the dimension of the
parameter space of the component specific model. In addition to the
fitting step different component specific models only require
different random number generators for the parametric bootstrap.

The \code{boot()} function in \pkg{flexmix} is a generic \proglang{S4}
function with a method for fitted finite mixtures of class
\code{"flexmix"} and is applicable to general finite mixture
models. The function with arguments and their defaults is given by:

<<results=verbatim, echo=false>>=
cat(prettyPrint(gsub("boot_flexmix", "boot", 
                     prompt(flexmix:::boot_flexmix, 
                            filename = NA)$usage[[2]]), sep = ", ", 
                linebreak = paste("\n", paste(rep(" ", 2), collapse = ""), sep= ""),
                width = 70))
@

The interface is similar to the \code{boot()} function in package
\pkg{boot} \citep{mixtures:Davison+Hinkley:1997,
  mixtures:Canty+Ripley:2010}. The \code{object} is a fitted finite
mixture of class \code{"flexmix"} and \code{R} denotes the number of
resamples.  The possible bootstrapping method are \code{"empirical"}
(also available as \code{"ordinary"}) and \code{"parametric"}. For the
parametric bootstrap sampling from the fitted mixture is performed
using \code{rflexmix()}. For mixture models with different component
specific models \code{rflexmix()} requires a sampling method for the
component specific model. Argument \code{initialize\_solution} allows
to select if the EM algorithm is started in the original finite
mixture solution or if random initialization is performed. The fitted
mixture model might contain weights and group indicators. The weights
are case weights and allow to reduce the amount of data if
observations are identical. This is useful for example for latent
class analysis of multivariate binary data. The argument
\code{keep\_weights} allows to indicate if they should be kept for the
bootstrapping. Group indicators allow to specify that the component
membership is identical over several observations, e.g., for repeated
measurements of the same individual. Argument \code{keep\_groups}
allows to indicate if the grouping information should also be used in
the bootstrapping. \code{verbose} indicates if information on the
progress should be printed. The \code{control} argument allows to
control the EM algorithm for fitting the model to each of the
bootstrap samples. By default the \code{control} argument is extracted
from the fitted model provided by \code{object}. \code{k} allows to
specify the number of components and by default this is also taken
from the fitted model provided. The \code{model} argument determines
if also the model and the weights slot for each sample are stored and
returned. The returned object is of class \code{"FLXboot"} and
otherwise only contains the fitted parameters, the fitted priors, the
log likelihoods, the number of components of the fitted mixtures and
the information if the EM algorithm has converged.

The likelihood ratio test is implemented based on \code{boot()} in
function \code{LR\_test()} and returns an object of class
\code{"htest"} containing the number of valid bootstrap replicates,
the p-value, the double negative log likelihood ratio test statistics
for the original data and the bootstrap replicates. The \code{plot}
method for \code{"FLXboot"} objects returns a parallel coordinate plot
with the fitted parameters separately for each of the components.

\section{Artificial data set}

In the following a finite mixture model is used as the underlying data
generating process which is theoretically not identifiable. We are
assuming a finite mixture of linear regression models with two
components of equal size where the coverage condition is not fulfilled
\citep{mixtures:Hennig:2000}. Hence, intra-component label switching
is possible, i.e., there exist two parameterizations implying the same
mixture distribution which differ how the components between the
covariate points are combined.

We assume that one measurement per object and a single categorical
regressor with two levels are given. The usual design matrix for a
model with intercept uses the two covariate points $\mathbf{x}_1 = (1,
0)'$ and $\mathbf{x}_2 = (1, 1)'$. The mixture distribution is given
by
\begin{eqnarray*}
  H(y|\mathbf{x}, \Theta) &=& \frac{1}{2} N(\mu_1, 0.1) + \frac{1}{2}
  N(\mu_2, 0.1),
\end{eqnarray*}
where $\mu_k(\mathbf{x}) = \mathbf{x}'\bm{\alpha}_k$ and $N(\mu, \sigma^2)$
is the normal distribution. 
  
Now let $\mu_1(\mathbf{x}_1) = 1$, $\mu_2(\mathbf{x}_1) = 2$,
$\mu_1(\mathbf{x}_2) = -1$ and $\mu_2(\mathbf{x}_2) = 4$.  As Gaussian
mixture distributions are generically identifiable the means,
variances and component weights are uniquely determined in each
covariate point given the mixture distribution.  However, as the
coverage condition is not fulfilled, the two possible solutions for
$\bm{\alpha}$ are:
\begin{description}
\item[Solution 1:]
  $\bm{\alpha}_1^{(1)} = (2,\phantom{-}2)'$, $\bm{\alpha}_2^{(1)} =
  (1,-2)'$,
\item[Solution 2:]
  $\bm{\alpha}_1^{(2)} = (2,-3)'$, $\bm{\alpha}_2^{(2)} =
  (1,\phantom{-}3)'$.
\end{description}

We specify this artificial mixture distribution using
\code{FLXdist()}. \code{FLXdist()} returns an unfitted finite mixture
of class \code{"FLXdist"}. The class of fitted finite mixture models
\code{"flexmix"} extends class \code{"FLXdist"}. Each component
follows a normal distribution. The parameters specified in a named
list therefore consist of the regression coefficients and the standard
deviation. Function \code{FLXdist()} has an argument \code{formula}
for specifying the regression in each of the components, an argument
\code{k} for the component weights and \code{components} for the
parameters of each of the components.

<<>>=
library("flexmix")
Component_1 <- list(Model_1 = list(coef = c(1, -2), sigma = sqrt(0.1)))
Component_2 <- list(Model_1 = list(coef = c(2,  2), sigma = sqrt(0.1)))
ArtEx.mix <- FLXdist(y ~ x, k = rep(0.5, 2),
  components = list(Component_1, Component_2))
@ 

We draw a balanced sample with 50 observations in each covariate point
from the mixture model using \code{rflexmix()} after defining the data
points for the covariates. \code{rflexmix()} can either have an
unfitted or a fitted finite mixture as input. For unfitted mixtures
data has to be provided using the \code{newdata} argument.  For
already fitted mixtures data can be optionally provided, otherwise the
data used for fitting the mixture is used.

<<>>=
ArtEx.data <- data.frame(x = rep(0:1, each = 100/2))
suppressWarnings(RNGversion("3.5.0"))
set.seed(123)
ArtEx.sim <- rflexmix(ArtEx.mix, newdata = ArtEx.data)
ArtEx.data$y <- ArtEx.sim$y[[1]]
ArtEx.data$class <- ArtEx.sim$class
@ 

In Figure~\ref{fig:art} the sample is plotted together with the two
solutions for combining $x_1$ and $x_2$, i.e., this illustrates
intra-component label switching.

\begin{figure}
  \centering
<<fig=true, echo=false, results=hide>>=
par(mar = c(5, 4, 2, 0) + 0.1)
plot(y ~ x, data = ArtEx.data, pch = with(ArtEx.data, 2*class + x))
pars <- list(matrix(c(1, -2, 2,  2), ncol = 2),
  matrix(c(1,  3, 2, -3), ncol = 2))
for (i in 1:2) apply(pars[[i]], 2, abline, col = Colors[i])
@ 
  \caption{Balanced sample from the artificial example with the two theoretical solutions.}
  \label{fig:art}
\end{figure}

We fit a finite mixture to the sample using \code{stepFlexmix()}. 

<<>>=
set.seed(123)
ArtEx.fit <- stepFlexmix(y ~ x, data = ArtEx.data, k = 2, nrep = 5, 
  control = list(iter = 1000, tol = 1e-8, verbose = 0))
@ 

The fitted mixture can be inspected using \code{summary()} and
\code{parameters()}. 
<<>>=
summary(ArtEx.fit)
parameters(ArtEx.fit)
@ 

Obviously the fitted mixture parameters correspond to the
parameterization we used to specify the mixture distribution. Using
standard asymptotic theory to analyze the fitted mixture model gives
the following estimates for the standard deviations.

<<>>=
ArtEx.refit <- refit(ArtEx.fit)
summary(ArtEx.refit)
@ 

The fitted mixture can also be analyzed using resampling
techniques. For analyzing the stability of the parameter estimates
where the possibility of identifiability problems is also taken into
account the parametric bootstrap is used with random
initialization. Function \code{boot()} can be used for empirical
or parametric bootstrap (specified by the argument \code{sim}). The
logical argument \code{initialize_solution} specifies if the
initialization is in the original solution or random. By default
random initialization is made. The number of bootstrap samples is set
by the argument \code{R}. Please note that the arguments are chosen to
correspond to those for function \code{boot} in package \pkg{boot}
\citep{mixtures:Davison+Hinkley:1997}.

<<eval=FALSE>>=
set.seed(123)
ArtEx.bs <- boot(ArtEx.fit, R = 200, sim = "parametric")
ArtEx.bs
@ 
<<echo=FALSE>>=
if (file.exists("ArtEx.bs.rda")) {
  load("ArtEx.bs.rda")
} else {
  set.seed(123)
  ArtEx.bs <- boot(ArtEx.fit, R = 200, sim = "parametric")
  save(ArtEx.bs, file = "ArtEx.bs.rda")
}
ArtEx.bs
@ 

Function \code{boot()} returns an object of class
\code{"\Sexpr{class(ArtEx.bs)}"}. The default plot compares the
bootstrap parameter estimates to the confidence intervals derived
using standard asymptotic theory in a parallel coordinate plot (see
Figure~\ref{fig:plot.FLXboot-art}). Clearly two groups of parameter
estimates can be distinguished which are about of equal size. One
subset of the parameter estimates stays within the confidence
intervals induced by standard asymptotic theory, while the second
group corresponds to the second solution and clusters around these
parameter values.

\begin{figure}[h!]
  \centering
<<fig=true>>=
print(plot(ArtEx.bs, ordering = "coef.x", col = Colors))
@
\caption{Diagnostic plot of the bootstrap results for the artificial
  example.}
\label{fig:plot.FLXboot-art}
\end{figure}

In the following the DIP-test is applied to check if the parameter
estimates follow a unimodal distribution. This is done for the
aggregated parameter esimates where unimodality implies that this
parameter is not suitable for imposing an ordering constraint which
induces a unique labelling.  For the separate component analysis which
is made after imposing an ordering constraint on the coefficient of
$x$ rejection the null hypothesis of unimodality implies that
identifiability problems are present, e.g.~due to intra-component
label switching.

<<>>=
require("diptest")
parameters <- parameters(ArtEx.bs)
Ordering <- factor(as.vector(apply(matrix(parameters[,"coef.x"], 
  nrow = 2), 2, order)))
Comp1 <- parameters[Ordering == 1,]
Comp2 <- parameters[Ordering == 2,]
dip.values.art <- matrix(nrow = ncol(parameters), ncol = 3, 
  dimnames=list(colnames(parameters),
  c("Aggregated", "Comp 1", "Comp 2")))
dip.values.art[,"Aggregated"] <- apply(parameters, 2, dip)
dip.values.art[,"Comp 1"] <- apply(Comp1, 2, dip)
dip.values.art[,"Comp 2"] <- apply(Comp2, 2, dip)
dip.values.art
@ 

The critical value for column \code{Aggregated} is
\Sexpr{round(critical_values(nrow(parameters)), digits = digits)} and for
the columns of the separate components
\Sexpr{round(critical_values(nrow(Comp1)), digits = digits)}. The
component sizes as well as the standard deviations follow a unimodal
distribution for the aggregated data as well as for each of the
components. The regression coefficients are multimodal for the
aggregate data as well as for each of the components. While from the
aggregated case it might be concluded that imposing an ordering
constraint on the intercept or the coefficient of $x$ is suitable, the
component-specific analyses reveal that a unique labelling was not
achieved. 

\section{Seizure}
In \cite{mixtures:Wang+Puterman+Cockburn:1996} a Poisson mixture
regression is fitted to data from a clinical trial where the effect of
intravenous gammaglobulin on suppression of epileptic seizures is
investigated. The data used were 140 observations from one treated
patient, where treatment started on the $28^\textrm{th}$ day. In
the regression model three independent variables were included:
treatment, trend and interaction treatment-trend. Treatment is a dummy
variable indicating if the treatment period has already started.
Furthermore, the number of parental observation hours per day were
available and it is assumed that the number of epileptic seizures per
observation hour follows a Poisson mixture distribution. The number of
epileptic seizures per parental observation hour for each day are
plotted in Figure~\ref{fig:seizure}. The fitted mixture distribution
consists of two components which can be interpreted as representing
'good' and 'bad' days of the patients.

The mixture model can be formulated by 
\begin{equation*}
  H(y|\mathbf{x}, \Theta) =  \pi_1 P(\lambda_1) + \pi_2
  P(\lambda_2),
\end{equation*}
where $\lambda_k = e^{\mathbf{x}'\bm{\alpha}_k}$ for $k = 1,2$ and
$P(\lambda)$ is the Poisson distribution.

The data is loaded and the mixture fitted with two components.
<<>>=
data("seizure", package = "flexmix")
model <- FLXMRglm(family = "poisson", offset = log(seizure$Hours))
control <- list(iter = 1000, tol = 1e-10, verbose = 0)
set.seed(123)
seizMix <- stepFlexmix(Seizures ~ Treatment * log(Day), 
  data = seizure, k = 2, nrep = 5, model = model, control = control)
@ 

The fitted regression lines for each of the two components are shown
in Figure~\ref{fig:seizure}. 

\begin{figure}[h!]
\begin{center}
<<fig=true, width=7, height=4>>=
par(mar = c(5, 4, 2, 0) + 0.1)
plot(Seizures/Hours~Day, data=seizure, pch = as.integer(seizure$Treatment))
abline(v = 27.5, lty = 2, col = "grey")
matplot(seizure$Day, fitted(seizMix)/seizure$Hours, type="l",
  add = TRUE, col = 1, lty = 1, lwd = 2)
@ 
\caption{Seizure data with the fitted values for the
  \citeauthor{mixtures:Wang+Puterman+Cockburn:1996} model. The
  plotting character for the observed values in the base period is a
  circle and for those in the treatment period a triangle.}
\label{fig:seizure}
\end{center}
\end{figure}

The parameteric bootstrap with random initialization is used to
investigate identifiability problems and parameter stability. The
diagnostic plot is given in Figure~\ref{fig:plot.FLXboot-seiz}. The
coloring is according to an ordering constraint on the
intercept. Clearly the parameter estimates corresponding to the
solution where the bad days from the base period are combined with the
good days from the treatement period and vice versa for the good days
of the base period can be distinguished and indicate the slight
identifiability problems of the fitted mixture.

<<eval=FALSE>>=
set.seed(123)
seizMix.bs <- boot(seizMix, R = 200, sim = "parametric")
seizMix.bs
@ 
<<echo=FALSE>>=
if (file.exists("seizMix.bs.rda")) {
  load("seizMix.bs.rda")
} else {
  set.seed(123)
  seizMix.bs <- boot(seizMix, R = 200, sim = "parametric")
  save(seizMix.bs, file = "seizMix.bs.rda")
}
seizMix.bs
@ 

\begin{figure}[h!]
  \centering
<<fig=true>>=
print(plot(seizMix.bs, ordering = "coef.(Intercept)", col = Colors))
@ 
\label{fig:plot.FLXboot-seiz}
\caption{Diagnostic plot of the bootstrap results for the
  \code{seizure} data.}
\end{figure}

<<>>=
parameters <- parameters(seizMix.bs)
Ordering <- factor(as.vector(apply(matrix(parameters[,"coef.(Intercept)"], 
  nrow = 2), 2, order)))
Comp1 <- parameters[Ordering == 1,]
Comp2 <- parameters[Ordering == 2,]
@ 

For applying the DIP test also an ordering constraint on the intercept
is used. The critical value for column \code{Aggregated} is
\Sexpr{round(critical_values(nrow(parameters)), digits = digits)} and for
the columns of the separate components
\Sexpr{round(critical_values(nrow(Comp1)), digits = digits)}. 

<<>>=
dip.values.art <- matrix(nrow = ncol(parameters), ncol = 3, 
  dimnames = list(colnames(parameters), 
  c("Aggregated", "Comp 1", "Comp 2")))
dip.values.art[,"Aggregated"] <- apply(parameters, 2, dip)
dip.values.art[,"Comp 1"] <- apply(Comp1, 2, dip)
dip.values.art[,"Comp 2"] <- apply(Comp2, 2, dip)
dip.values.art
@ 

For the aggregate results the hypothesis of unimodality cannot be
rejected for the trend. For the component-specific analyses
unimodality cannot be rejected only for the intercept (where the
ordering condition was imposed on) and again the trend. For all other
parameter estimates unimodality is rejected which indicates that the
ordering constraint was able to impose a unique labelling only for the
own parameter and not for the other parameters. This suggests
identifiability problems. 
%%-------------------------------------------------------------------------
%%-------------------------------------------------------------------------
\bibliography{mixture}
\end{document}