%\VignetteIndexEntry{Recovering a Basic Space in R}
%\VignetteKeywords{multivariate}
%\VignettePackage{basicspace}

%\documentclass[nojss]{jss}
\documentclass[nogin, article, nojss]{jss}
\graphicspath{{Figures/}}
%\setkeys{Gin}{width=.8\textwidth}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% almost as usual
\author{Keith Poole \\ University of \\ Georgia
  \And
  Jeffrey Lewis \\ University of \\ California, Los Angeles
  \And
  Howard Rosenthal \\ New York\\University
  \And
  James Lo \\ Princeton \\ University
  \And
  Royce Carroll \\ Rice \\ University
}
\Plainauthor{Keith Poole, Jeffrey Lewis, Howard Rosenthal, James Lo, Royce Carroll}

\title{Recovering a Basic Space from Issue Scales in \proglang{R}}

%% for pretty printing and a nice hypersummary also set:
\Plaintitle{Recovering a Basic Space from Issue Scales in R} %% without formatting
%\Shorttitle{A Capitalized Title} %% a short title (if necessary)

%% an abstract and keywords
\Abstract{
\noindent \pkg{basicspace} is an \proglang{R} package that conducts
Aldrich-McKelvey and Blackbox scaling to recover estimates of the
underlying latent dimensions of issue scale data. We illustrate
several applications of the package to survey data commonly used in
the social sciences. Monte Carlo tests demonstrate that the
procedure can recover latent dimensions and reproduce the
matrix of responses at moderate levels of error and missing data. 
}

\Keywords{multivariate, \proglang{R}, scaling}
\Plainkeywords{multivariate, R, scaling}

%% at least one keyword must be supplied
%% publication information
%% NOTE: Typically, this can be left commented and will be filled out by the technical editor
%% \Volume{13}
%% \Issue{9}
%% \Month{September}
%% \Year{2004}
%% \Submitdate{2004-09-29}
%% \Acceptdate{2004-09-29}

%% The address of (at least) one author should be given
%% in the following format:
\Address{

  Keith T. Poole\\
  Department of Political Science\\
  Baldwin Hall\\
  University of Georgia\\
  Athens, GA 30602\\
  E-mail: \email{kpoole@uga.edu}\\
  URL: \url{http://www.voteview.com/}\\\

  Jeffrey B. Lewis\\
  University of California - Los Angeles\\
  Political Science Department, Bunche Hall\\
  Los Angeles, CA 90095\\
  E-mail: \email{jblewis@ucla.edu}\\
  URL: \url{http://www.polisci.ucla.edu/faculty/lewis/}\\\

  Howard Rosenthal\\
  NYU Department of Politics\\
  19 W. 4th Street, New York,  10012\\
  E-mail: \email{howardrosenthal@nyu.edu}\\
  URL: \url{http://politics.as.nyu.edu/object/HowardRosenthal}\\\

  James Lo\\
  Princeton University, Department of Politics\\
  130 Corwin Hall\\
  Princeton, NJ 08544\\
  E-mail: \email{jameslo@princeton.edu}\\\

  Royce Carroll\\
  Department of Political Science, MS 24\\
  Rice University\\
  PO Box 1892\\
  Houston, Texas 77251-1892\\
  E-mail: \email{rcarroll@rice.edu}\\
  URL: \url{http://rcarroll.web.rice.edu/}\\\

}

%% It is also possible to add a telephone and fax number
%% before the e-mail in the following format:
%% Telephone: +43/1/31336-5053
%% Fax: +43/1/31336-734

%% for those who use Sweave please include the following line (with % symbols):
%% need no \usepackage{Sweave.sty}

%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}
%\SweaveOpts{concordance=TRUE}

\section{Introduction}

The \pkg{basicspace} package enables the spatial\footnote{Note that ``spatial''
  in this context refers not to geography but to perceputal or ideological space, as popularized in the work of \citet{Downs1957} on the spatial model of voting.} analysis of
self-placement and/or perceptual survey data in \proglang{R}
\citep{R}. Issue scales, where respondents place themselves and/or
stimuli on a numeric scale, are a common form of data gathered and
analyzed by survey researchers and social scientists. For example,
since 1968 the American National Election Studies have gathered
seven-point scale data on a variety of issues.  Respondents are shown
scales with labeled endpoints such as ``liberal'' and ``conservative''
and are then asked to place themselves and political figures on the
scales. As with other forms of response data, researchers are often
interested in understanding the extent to which a set of issue scale
placements are driven by an underlying latent dimension. This package
contains software designed to recover the latent dimensions---i.e., a
basic space---from issue scale data such as surveys. The functions
contained in \pkg{basicspace} will recover spatial estimates of
respondent positions and scale them and the stimuli into a common
space.  The package implements the Blackbox method
\citep{poole1998recovering} and the Aldrich-McKelvey \citep{aldrich1977method} scaling procedures, which have been used in a number of previous social science studies including \citet{palfrey1987relationship}, \citet{poole1998recovering}, and \citet{saiegh2009recovering}. 

Scaling techniques are already widely used by political scientists in empirical models of voting (also known as ideal point esimation) that allow legislator locations in an abstract policy or ideological space to be inferred from their legislative votes. \footnote{For a more extensive review of applications of spatial modeling in the social sciences, see \citet{poole2005spatial}. The most prominent ideal point model in the political science literature, W-NOMINATE \citep{Congress}, estimates the policy preferences of legislators using observed roll call votes as the primary source of data. The \pkg{wnominate} package on CRAN contains software used to estimate NOMINATE scores.} Similarly, the motivation for recovering spatial information from issue scales, such as those in political surveys, is to detect the underlying
dimensions behind the reported attitudes of survey respondents that explain the basic relationships among the respondents and stimuli.  

One technique for analyzing such data, Aldrich and McKelvey's method \citep{aldrich1977method}, makes use of respondent information regarding the positions of stimuli (e.g. politicians or parties) to estimate the perceptual bias of each respondent \footnote{The method thus provides a means to account for differential item functioning} and obtain estimated locations for both stimuli and respondents along a single issue scale dimension (e.g. liberal-conservative).  \citet{poole1998recovering} developed the Blackbox scaling procedure as a generalization of Aldrich-McKelvey's method, which is implemented in this package as the \texttt{blackbox} and \texttt{blackbox transpose} functions. These methods apply to a wide range of issue scale problems because it incorporates information from multiple issue scales to scale in multiple dimensions and because it allows for missing data (e.g. survey non-response). The \texttt{blackbox} function recovers n-dimensional ideal points (i.e. spatial coordinates) for respondents based on their own preference data across any number of issue scales. The \texttt{blackbox$\_$transpose} function, meanwhile, recovers the spatial location of stimuli based on respondent estimates. 

This paper proceeds in four steps. First, we begin with a description of the mathematics that underlie the Blackbox estimator, which performs a singular value decomposition of a rectangular matrix containing missing elements.  We then provide three examples using the \pkg{basicspace} package to implement this method in \proglang{R}. First, we show a Monte Carlo analysis that suggests the estimator produces an accurate decomposition of our simulated data matrices, even with 30 per cent of the data missing. Secondly, we show how the procedure can be applied to self-placement survey data from the 1980 National Election Survey. Next, we proceed with an application of the model to perceptual data from the 1980 National Election Study, where various political candidates are ranked along a 7 point liberal-conservative scale. Finally, we describe the earlier estimator developed by \citet{aldrich1977method} which is also included as a function in this package.

\section{Model}

The exposition of the model presented here closely follows
\citet{poole1998recovering}. Consider a matrix of survey data $X_0$
with $N$ respondents and $M$ issue scales, with individuals on the
rows and issues on the columns. Some cells of the matrix $X_0$ are
missing, and we let $X$ denote the version of $X_0$ that has no
missing data. In each cell $x_{ij}$, respondent $i$ (\textit{i = 1,
  \ldots, N}) reports their position on issue scale $j$ (\textit{j = 1,
  \ldots, M}), with some responses missing.\footnote{The `0' subscript
  indicates that some elements are missing from the matrix.} Now let
$\Psi_{ik}$ be the $i$th individual's position on the $k$th basic
dimension (\textit{k = 1, \ldots, s}), $W$ be an $M$ by $s$ matrix of weights that map individual positions from the basic space to the issue dimensions, $c$ be a vector of issue dimension intercept terms of length $m$, $J_N$ by an $N$ length vector of ones, and $E_0$ be error terms in the data matrix. The model that we seek to estimate is:

\begin{equation}
X_0 = [\Psi W' + J_N c']_0 + E_0
\label{one}
\end{equation}

Without loss of generality, we also assume that $E_0$ is drawn from a symmetric distribution with
mean 0 and the centroid of the basic space coordinates is at the origin (i.e., $J_N' \Psi=0$).
Substituting into the model equation, this implies that $J_N' [X - J_N c'] = 0_M'$, where $0_M$ is an $M$
length vector of zeroes. Then in the situation where $X_0$ has no missing data, the parameters
of interest can all be recovered using singular value decomposition.  To see why this is true, recall
that for an $N$ by $M$ matrix of real elements with $N \ge M$, there exists an $N$ by $M$ orthogonal
matrix $U$, an $M$ by $M$ orthogonal matrix $V$, and an $M$ by $M$ matrix $\Lambda$ such that:

\begin{equation}
X = U \Lambda V'
\label{two}
\end{equation}

\noindent where $\Lambda$ is a diagonal matrix of singular values.\footnote{A more general
form of this equation can be written in which $\Lambda$ is instead an $N$ by $M$ matrix and $U$ is
$N$ by $N$.} To solve Equation~\ref{one}, set $c$ equal to the column means of $X$, or $c_j = \frac{\sum\limits_{i=1}^N x_{ij}}{N} = \bar{x}_j$. Then using Equation~\ref{two}, the singular value decomposition of $X - J_N c'$ can be expressed as:

$$X - J_N c'\ = \ U \Lambda V'\ = \ \Psi W'$$

This implies that in the absence of missing data, one solution for $\Psi$ and $W$ is:

$$\Psi = U \Lambda^{0.5}$$

$$W = V \Lambda^{0.5}$$

\noindent with $\Lambda^{0.5}$ being a diagonal matrix where diagonal elements are the square roots of $\Lambda$. While other solutions to this problem exist, \citet{eckart1936approximation} have shown that the least squares approximation in $s$ dimensions of a matrix $A$ can be found by using only the first $s$ singular values of $A$ along the diagonal of $\Lambda$ and re-multiplying $U \Lambda V'$.

In the presence of missing data in data matrix $X_0$, the use of singular value decomposition
to solve for $W$ and $\Psi$ is no longer possible, and we instead estimate $\hat{W}$ and
$\hat{\Psi}$ using an alternating least squares (ALS) technique that is similar to the procedures
used in \citet{carroll1970analysis} and \citet{takane1977nonmetric}. The objective
function to be minimized is the sum of the squared deviations across all cells in $A$ after
the columns have been adjusted for column means, or:

$$\xi = \displaystyle\sum\limits_{i=1}^N \sum\limits_{j=1}^{m_i} \{[ \sum\limits_{k=1}^s \Psi_{ik} W_{jk}] + c_j - x_{ij} \}^2$$

\noindent where $m_i$ is the number of non-missing entries on row $i$.  In minimizing this objective function, two constraints from the earlier analysis with no missing data are applied. First, we exploit the fact that $\Psi$ and $W$ are orthogonal matrices, which implies
that $\Psi' \Psi = W' W$.\footnote{More specifically, $\Psi' \Psi = \Lambda^{0.5} U' U \Lambda^{0.5} = \Lambda^{0.5} I_M \Lambda^{0.5} = \Lambda = W' W.$} Secondly, following our earlier restriction that $J_N' [X - J_N c'] = 0_M'$, $J_N' U = J_N' \Psi = 0_M'$ as well.
These restrictions produce the Lagrangian multiplier problem:


$$\mu = \xi + 2\gamma' ['\Psi' J_N] + tr[\Phi (\Psi' \Psi - W' W)]$$

\noindent where $\Phi$ is a symmetric $s$ by $s$ matrix of Lagrangian multipliers and $\gamma$ is an $s$ length
vector of Lagrangian multipliers. Since all Lagrangian multipliers are zero,\footnote{See Appendix A in \citet{poole1998recovering}
for a full proof that all Lagrangian multipliers are zero.} the partial derivatives of $\xi$ are:

\begin{equation}
\frac{\partial \mu}{\partial \Psi_{ik}} = 2\displaystyle\sum\limits_{j=1}^{m_i} [(\sum\limits_{l=1}^s w_{jl} \psi_{il})+ c_j - x_{ij}] w_{jk}
\label{three}
\end{equation}

\begin{equation}
\frac{\partial \mu}{\partial w_{jk}} = 2\displaystyle\sum\limits_{i=1}^{n_j} [(\sum\limits_{l=1}^s w_{jl} \psi_{il})+ c_j - x_{ij}] \psi_{jl}
\label{four}
\end{equation}

\begin{equation}
\frac{\partial \mu}{\partial c_j} = 2\displaystyle\sum\limits_{i=1}^{n_j} [(\sum\limits_{l=1}^s w_{jl} \psi_{il})+ c_j - x_{ij}]
\label{five}
\end{equation}

Let $W^*$ be an $m_i$ by $s$ matrix with appropriate rows corresponding to missing entries in $X_0$ removed, $x_{0i}$ be the
length $m_i$ row of $X_0$, and $c_0$ be the length $m_i$ vector of constants corresponding to the elements of $x_{0i}$. Then
if $W^*$'$W^*$ exists, the $ith$ row of $\Psi$ can be estimated by setting Equation~\ref{three} to zero, collecting the $s$ partial derivatives
of the $i$th row of $\Psi$ into a vector and solving for $\psi_j$ as:

\begin{equation}
\hat{\psi}_i = (W^*\mathrm{'}W^*)^{-1} W^*\mathrm{'}[x_{0i} - c_{0}]
\label{six}
\end{equation}

\noindent which can of course by estimated using ordinary least squares. Similarly, let $\Psi_j^* = [\Psi_0|J_0]$ be an
$n_j$ by $s+1$ matrix with the appropriate rows corresponding to missing data removed and bordered by ones, $w_j$ be
the $s$ length vector of row $j$ in $W$, $c_j$ be the $j$th element of $c$, and $x_{0j}$ be the $j$th column of $X_0$.
Then if $\Psi_{j}^*$$'\Psi_j^*$ exists, $w_j$ and $c_j$ can be jointly estimated by combining Equations~\ref{four} and~\ref{five} as:

\begin{equation}
\frac{\hat{w}_j}{\hat{c}_j} = (\Psi_j^*\mathrm{'}\Psi_j^*)^{-1} \Psi_j^*\mathrm{'}x_{0j}
\label{seven}
\end{equation}

Equations~\ref{six} and~\ref{seven} represent the core set of equations that are used to solve for $\hat{W}$,
$\hat{c}$, and $\hat{\Psi}$. Once a set of starting values has been generated, Equations~\ref{six} and~\ref{seven} are
iterated until convergence is complete. Generation of appropriate start values is conducted one dimension at a time,
and a more detailed justification of the procedure can be found in \citet{poole1998recovering}. On the first dimension, start
values are generated by using the following three equations:

\begin{equation}
\hat{c}_{j} = \frac{\displaystyle\sum\limits_{j=1}^{n_j} x_{ij}}{n_j} = \bar{x}_j
\label{eight}
\end{equation}

\begin{equation}
w_{j1} = diag(\Gamma)
\label{nine}
\end{equation}

\noindent where $\Gamma$ is an $M$ by $M$ diagonal matrix with diagonal elements either set to 1 or -1
that maximizes the number of positive elements in the $M$ by $M$ covariance matrix
$\Gamma[X_0 - J_N c']\mathrm{'}[X_0 - J_N c']\Gamma$. $\Gamma$ is found by a simple iterative process
similar to that used to speed eigenvector/eigenvalue decomposition \citep{poole1998recovering}. Given Equations~\ref{eight}
and~\ref{nine}, starting values for $\psi$ are:

\begin{equation}
\hat{\psi}_{i1} = \frac{\displaystyle\sum\limits_{j=1}^{m_i} \hat{w}_{j1}(x_{ij}-\hat{c}_j)}{m_i}
\label{ten}
\end{equation}

If more than one dimension is to be estimated ($s > 1$), start values for other dimensions can be generated
simply by replacing the data matrix $X_0$ with the matrix of residuals $E_{0s}$ in Equations~\ref{nine} and ~\ref{ten}. However, no further estimation of start values for $\hat{c}$ is required. The matrix of residuals to be used for generating start values on dimension $s$ is:

$$E_{0s} = X_0 - \displaystyle\sum\limits_{j=1}^{s} \hat{\Psi}_s\hat{w}_s\mathrm{'} - J_N\hat{c}'$$

This residual matrix allows the generation of higher-dimension start values by iterating $\Gamma$ to
maximize the positive elements in $E_{0s}$. The starting values are now:

%Equation 12 in paper
\begin{equation}
\hat{\psi}_{is} = \frac{\displaystyle\sum\limits_{j=1}^{m_i}
  \hat{w}_{js} e_{(s-1)ij}}{\sum\limits_{j=1}^{m_j} \hat{w}_{js}^2}
\label{eleven}
\end{equation}

\noindent where the initial $\hat{w_{js}}$ values of +1s and -1s are used to obtain $\hat{\psi}_{is}$ starting values. The starting values of $\hat{w_{js}}$ are now:

%Equation 11 in paper
\begin{equation}
\hat{w}_{js} = \frac{\displaystyle\sum\limits_{i=1}^{n_j}
  \hat{\psi}_{is} e_{(s-1)ij}}{\sum\limits_{i=1}^{n_j} \hat{\psi}_{is}^2}
\label{twelve}
\end{equation}


Summarizing the preceding discussion in full, the basic space technique decomposes an $N$ by $M$ matrix $X_0$ with
$N \geq M$ following Equation~\ref{one}. Estimation of Equation~\ref{one} proceeds
in three steps. In the first stage, starting values on the first dimension are generated for $\hat{c}_{j}$, $\hat{w}_{j1}$,
and $\hat{\psi}_{i1}$ by iterating Equations~\ref{eight}-\ref{ten} until convergence. In the second stage, if the
number of dimensions to be estimated $s > 1$, higher dimensional starting values for $\hat{\psi}_{is}$ and $\hat{w}_{is}$
are generated dimension by dimension using Equations~\ref{eleven}-\ref{twelve}. Finally, the starting values generated 
in the preceding two stages are improved by iterating Equations~\ref{six}-\ref{seven} until convergence.

\section{Monte Carlo Test}

In this section, we present the first of four motivating examples. We begin with a Monte Carlo example that
tests the basic space technique against simulated data. Four key variables should be set in each simulation:
the number of respondents $N$ (set here to $N = 1000$), the number of issue scales (also referred to as stimuli,
and set here to $M = 20$), the number of explanatory dimensions (set here to $s = 2$), the
fraction of observations that are missing (set here as 0.3), and the distribution of error terms (set here as
random uniform draws from -0.5 to 0.5). These variables can be changed for other simulations, but the
restriction that $N \geq M$ must be hold true. In cases where $M \geq N$ please refer to the second example that uses
\code{blackbox\_transpose} and \code{aldmck}.


<<one>>=
set.seed(1231)
library("basicspace")
N <- 1000
M <- 20
s <- 2
fraction.missing <- 0.3
E <- matrix(runif(N * M, min = -0.5, max = 0.5), nrow = N, ncol = M)
@

To generate the $X$ matrix (i.e., the matrix in Equation~\ref{one} before missing values are introduced), separately
generate the matrices that produce the singular value decomposition of $X$ following Equation~\ref{two}. Also
generate the $J_n$ and $c$ vectors from Equation~\ref{one}. While $X$ can be generated directly in one step, creating the
components separately enjoys two significant advantages. First, recovery of the true values of $\Psi$ and $W$
is simplified. Secondly, the creation of $\Lambda$ separately allows us to more easily tune the dimensionality
of the matrix as desired.

<<two>>=
U <- matrix(runif(N * s), nrow = N, ncol = s)
D <- diag(seq(from = 2.1, by = -0.2, length.out = s))
V.prime <- matrix(runif(s * M), nrow = s, ncol = M)
c <- rnorm(M)
Jn <- rep(1, N)
@

With the intermediate matrices just generated, we can produce our $X$ matrix by using Equation~\ref{one}
and the true $\Psi$ and $W$ matrices using: $\Psi = U\Lambda^{0.5}$ and $W = V\Lambda^{0.5}$. 

<<three>>=
X.true <- U %*% D %*% V.prime + Jn %o% c
X.0 <- X.true + E
Psi.true <- U %*% sqrt(D)
W.true <- t(V.prime) %*% sqrt(D)
@

$X_0$ is simply the $X$ matrix with missing data values included completely at random, so we insert
our missing data code (999 in the example) into the appropriate fraction of values as follows:

<<four>>=
missing <- sample(1:(N * M), round(fraction.missing * N * M))
X.0[missing] <- 999
@

The final step before estimation is to assign row and column names to the data set prior to input. In
most applications these names are generally pulled from a survey, but they can also be generated
manually:

<<five>>=
rownames(X.0) <- paste("Legis", 1:N, sep = "")
colnames(X.0) <- paste("V", 1:M, sep = "")
@

Estimation of the Monte Carlo data after formatting is trivial. The function that applies the basic space
decomposition described in this paper is \code{blackbox}.  It takes four arguments: the matrix to
be decomposed, a vector of missing data values, a Boolean flag indicating whether verbose output is
desired, the number of dimensions to estimate, and the minimum number of issue scales that an individual
needs to provide responses to if they are to be included in the estimation.

<<six>>=
result <- blackbox(X.0, missing = c(999), verbose = TRUE, dims = 3, minscale = 8)
names(result)
@

The output object contains multiple data frames summarizing the results of the estimation. The key data
frames are \code{stimuli}, which contain estimates of $\hat{W}$ and $\hat{c}$, as well as \code{individuals},
which contain estimates of $\hat{\Psi}$.  The other quantities are fit statistics described in greater
detail in the standard documentation for the function.

With the estimates complete, we are now able to test the recovery of our parameters of interest. In general,
scaling problems are not fully identified. Stated differently, given $X = \Psi W'$, $\Psi$ and $W'$ are not
unique solutions because $X = \Psi K K^{-1} W'$ for any conformable and invertible matrix $K$, so $X$ can
always be decomposed instead as $X = \Psi^* W^{*'}$ where $\Psi^* = \Psi K$ and $W^{*'} = K^{-1} W'$. 
When evaluating parameter fit, we are therefore largely concerned with finding monotonic relationships between
the true and estimated parameters of interest.  Figure~\ref{fig:seven} compares the true vs. estimated values of
$\Psi$ across two dimensions, and the results suggest a reasonable model fit.  For this comparison, $\Psi$ and $\Psi^*$ are mean centered and rotated. 

\begin{figure}
\begin{center}
<<label=seven, fig=TRUE, echo=TRUE, width=10, height=5>>=

Psi.hat <- cbind(result$individuals[[2]]$c1, result$individuals[[2]]$c2)
c.hat <- result$stimuli[[2]]$c
xrow <- sapply(1:N, function(i) length(rep(1,s)[!is.na(Psi.hat[i,])]))
Psi.hat <- Psi.hat[!(xrow<2),]
Psi.true <- Psi.true[!(xrow<2),]
Psi.hat[,1] <- Psi.hat[,1]-mean(Psi.hat[,1])
Psi.hat[,2] <- Psi.hat[,2]-mean(Psi.hat[,2])
Psi.true[,1] <- Psi.true[,1]-mean(Psi.true[,1])
Psi.true[,2] <- Psi.true[,2]-mean(Psi.true[,2])
C <- t(Psi.true)%*%Psi.hat
svddecomp <- svd(C)
U.rotate <- svddecomp$u
V.rotate <- svddecomp$v
T <- V.rotate %*% t(U.rotate)
Psi.hatrotate <- Psi.hat %*% T

oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow=c(1,2))
plot(Psi.true[,1],Psi.hatrotate[,1], xlim= c(-0.7,0.7), ylim= c(-0.5, 0.5),
pch=20, cex=0.4, cex.lab=1.6, bty="n",
xlab="True Psi, first dimension",
ylab="Recovered Psi, first dimension")
plot(Psi.true[,2],Psi.hatrotate[,2], xlim= c(-0.7,0.7), ylim= c(-0.5, 0.5),
pch=20, cex=0.4, cex.lab=1.6, bty="n",
xlab="True Psi, second dimension", ylab="Recovered Psi, second dimension")
@
\end{center}
\caption{Plots of True vs. Estimated $\Psi$ scores, first and second dimension.}
\label{fig:seven}
\end{figure}

Figure~\ref{fig:six} shows the results for the same procedure applied to $W$. In Figure~\ref{fig:eight} we repeat this analysis for $c$, which is a column mean that is only estimated in one dimension.  In both cases the estimates for $\hat{W}$ and $\hat{c}$ are a monotonic transformation of the true parameters as expected.\footnote{In other estimates, the relationship may only be affine because $X = \Psi W'$
implies $X = -(\Psi) -(W')$ as well.}

\begin{figure}
\begin{center}
<<label=six,fig=TRUE,echo=TRUE,width=10,height=5>>=
W.hat <- cbind(result$stimuli[[2]]$w1, result$stimuli[[2]]$w2)
W.hatrotate <- W.hat%*%T
oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow=c(1,2))

plot(W.true[,1],W.hatrotate[,1], xlim= c( 0.00, 1.50), ylim= c(-0.75, 2.75),
pch=20, cex=1.5, cex.lab=1.6, bty="n",
xlab="True W, first dimension", ylab="Recovered W, first dimension")

plot(W.true[,2],W.hatrotate[,2], xlim= c( 0.00, 1.50), ylim= c(-0.75, 2.75),
pch=20, cex=1.5, cex.lab=1.6, bty="n",
xlab="True W, second dimension", ylab="Recovered W, second dimension")

@
\end{center}
\caption{Plots of True vs. Estimated $W$ scores, first and second dimension.}
\label{fig:six}
\end{figure}


\begin{figure}
\begin{center}
<<label=eight,fig=TRUE,echo=TRUE,width=6,height=6>>=

oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow=c(1,1))

plot(c, c.hat, 
pch=20, cex=1.2, cex.lab=1.1, bty="n", 
xlab="True C", ylab="Recovered C")
@
\end{center}
\caption{Plot of True vs. Estimated $c$ scores.}
\label{fig:eight}
\end{figure}

Finally, we pool our estimates of $\hat{W}$, $\hat{\Psi}$, and $\hat{c}$ together to
estimate the full matrix $\hat{X}$ following Equation~\ref{one}.  While social
scientists are principally concerned with estimation of $\hat{W}$ and $\hat{\Psi}$,
others seeking to conduct singular value decomposition of matrices with missing
data may find $\hat{X}$ to be of value. One potential application of $\hat{X}$ is
as an imputation tool for missing data.\footnote{The simulation
presented here simulates missing data under the Missing Completely at Random (MCAR)
assumption --- nevertheless, this should not work
under conditions where data are instead Missing at Random (MAR).} To test the viability
of this idea, we separately plot the true values of $X$ against the estimated values
of $\hat{X}$ separately for the cells retained in the estimation, and compared
those results to estimates of $\hat{X}$ in cells that were discarded prior to estimation
to simulate the missing data mechanism. Figure~\ref{fig:nine} presents our results for retained vs.
imputed X. What is particularly notable about this result is the close similarity
between these plots --- the imputed values not only appear reasonable (i.e., line up
with the true values along a $45^{\circ}$ line), but imputed values do not appear to have
significantly higher mean squared error than the values that were retained (i.e., variance
along the $45^{\circ}$ line is similar in both plots). These results suggest that the use of the techniques demonstrated here may have greater
applicability beyond survey research.  Further discussion of imputation can be found in the
unpublished appendix to \citet{poole1998recovering}.


\begin{figure}
\begin{center}
<<label=nine,fig=TRUE,echo=TRUE,width=10,height=5>>=

W.hat <- cbind(result$stimuli[[2]]$w1, result$stimuli[[2]]$w2)
Psi.hat <- cbind(result$individuals[[2]]$c1, result$individuals[[2]]$c2)
X.hat <- Psi.hat %*% t(W.hat) + Jn %o% result$stimuli[[2]]$c

oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow=c(1,2))

plot(X.true[missing], X.hat[missing],
pch = 20, cex = 0.4, cex.lab = 1.2, bty = "n",
xlab = "True X, missing values", ylab = "Recovered X, missing values")

plot(X.true[!(1:(N*M) %in% missing)], X.hat[!(1:(N*M) %in% missing)],
pch = 20, cex = 0.4, cex.lab = 1.2, bty = "n",
xlab = "True X, nonmissing values", ylab = "Recovered X, nonmissing values")
@
\end{center}
\caption{Plots of True vs. Estimated $X$ scores for missing vs. non-missing values.}
\label{fig:nine}
\end{figure}


\section{Example 1: 1980 NES Issue Scales}

In this section we present an application of the basic space model to a set of issue scales from the
1980 National Election Study. This survey contains N=1,614 respondents who were asked to place themselves
on scales about desired levels of defense spending, inflation, tax cuts, abortion, liberal-conservative
scales, the role of women, the role of government in providing jobs, busing, and other similar issues.
We assume that each respondent has a location in a common ideological space and attempt to recover
estimates of those locations, which is represented as $\Psi$ in Equation~\ref{one}. The data is simply
stored in a standard matrix or data frame with respondents on the rows and survey questions (i.e. stimuli)
on the columns as follows:

<<ten>>=
data("Issues1980")
Issues1980[1:10, 1:4]
@

Virtually all surveys contain missing data, and for the two survey questions about abortion, `7' is
used as a missing data code. However, many of the other scales in this data set use 7 point scales, so
we need to recode the missing data for those questions.  For all questions, 0, 8, and 9 are missing data codes.\footnote{Data used for this estimator are typically opinion surveys where significant amounts of missing data are commonplace --- thus, recoding of this sort will typically be necessary for most applications.}

<<eleven>>=
Issues1980[Issues1980[, "abortion1"] == 7, "abortion1"] <- 8
Issues1980[Issues1980[, "abortion2" ]== 7, "abortion2"] <- 8
@

Estimation of the scores is now trivial using the \code{blackbox} function, which takes the same
arguments already described in the Monte Carlo example:

<<twelve>>=

## Commented to shorten runtimes
# Issues1980_bb <- blackbox(Issues1980, missing=c(0,8,9), verbose=FALSE,
#     dims=3, minscale=8)
data(Issues1980_bb)
@

Objects of class \code{blackbox} can also be summarized using the \code{summary} function, although
the summaries largely provide only summaries of the stimuli. For each dimension estimated, the summary
provides the intercept ($c$) and stretch ($w_1 \ldots w_3$) parameters for each question, as well
as the number of respondents and various fit statistics.

<<thirteen>>=
summary(Issues1980_bb)
@

When using \code{blackbox} for applied research, the researcher's principal goal is the recovery of
the individual parameters stored as the \code{individuals} data frame. These typically represent our
estimate of the individual's ideological location in the basic space.  Due to the model identification issue discussed above, these measures are defined only up to an affine transformation of the true space.
In particular, the rotation of the estimate is not specified, so if the ideological location is to be
substantively measured as a liberalism/conservatism score, its rotation should be validated so that it
can be transformed if necessary.  Here we conduct such a check by correlating our recovered scores with
self-reported liberal-conservative scores, where higher scores indicate higher levels of conservatism.
The correlation is negative, suggesting that as the recovered scores increase, the respondents become more liberal.
Since the norm in political science research is to orient liberal-conservative scores to increase as
conservatism increases, the researcher may wish to rotate the scores (i.e., by multiplying them by -1) before
using them for auxiliary analyses. 

<<fourteen>>=
cor(Issues1980_bb$individuals[[1]]$c1, Issues1980[, "libcon1"], use="pairwise")
@

\section{Example 2: 1980 NES Liberal-Conservative Scale}

In our previous example applying the basic space model to analyze respondent self-placement
on issue scales, we considered an example where the bias and stretch parameters $c$ and $w$ were
estimated for the column parameters.  However, we may instead wish to estimate a version of the model
where $c$ and $w$ are estimated for the row parameters (i.e., the survey respondents) instead.  This is
simply a transposed version of the basic space model, where $M \geq N$ instead of $N \leq M$. In this example we analyze perceptual data from the 1980 National Election Study. A total
of N = 888 respondents were asked to place six stimuli (Carter, Reagan, Kennedy, Anderson, the Republicans,
and the Democrats) on a 7 point liberal-conservative scale. Our objective is to estimate the locations
of the six stimuli in the basic space, which each respondent perceives with some bias and stretch parameter.
The data is input in a manner identical to before, with survey respondents on the rows and stimuli on
the columns. One very important difference between \code{blackbox} and \code{blackbox\_transpose} is that
in most survey data sets, the number of respondents is very large relative to the number of stimuli.
This typically means that \code{blackbox\_transpose} takes much longer to estimate because it estimates both a bias $c$ and stretch $W$ parameter for each respondent. To estimate the 1980 liberal-conservative placements
using \code{blackbox\_transpose}, we simply load the data and call the function as follows:

<<blackbt>>=
data("LC1980")
LCdat=LC1980[, -1]
LCdat[1:10,]

## Commented to shorten runtimes
# LC1980_bbt <- blackbox_transpose(LCdat, missing=c(0,8,9), dims=3,
#  minscale=5,verbose=TRUE)
data(LC1980_bbt)
   
@

In an effort to simplify interpretation of results from \code{blackbox\_tranpose}, we include two plot functions.
These functions plot the location of the stimuli against a probability and cumulative distribution plot of
locations of the population weights (see Figure 5).

\begin{figure}
\begin{center}
<<label=bbt,fig=TRUE,echo=TRUE,width=10,height=5>>=
oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow = c(1, 2))

plot(LC1980_bbt)
plotcdf.blackbt(LC1980_bbt)
@
\end{center}
\caption{Blackbox Transpose PDF and CDF plots.}
\label{fig:bbt}
\end{figure}

We can also produce summary reports of the stimuli as follows:

<<blackfinal>>=
summary(LC1980_bbt)
@

The second dimension is picking up John Anderson, a Representative from Illinois who ran as a third party candidate in 1980.  Respondents clearly had trouble placing Anderson on the liberal-conservative scale. The second dimension is picking up this ambiguity of position.

\section{Example 3: Aldrich and McKelvey's Estimator}

The transposed basic space model is an alternative to a model developed by \citet{aldrich1977method}, which is also intended for scaling perceptual data from survey. While the Aldrich-McKelvey model is restricted to analyzing matrices with no missing values in only one dimension, it also incorporates parameters accounting for individual bias and stretch. The Aldrich-McKelvey model is:

$$Y_{ij} = Z_j + \epsilon_{ij}$$

\noindent where $Z_j$ is the true location of $j$ and $\epsilon_{ij}$ is a random variable with mean 0, positive variance that is independent of $i$ and $j$ (homoskedastic), and zero covariance across the $i$'s and $j$'s. Aldrich and McKelvey then introduce two distortion parameters, $c_i$ and $w_i$, that transform the perceived candidate position into a reported candidate position $R_{ij}$, according to:

$$R_{ij}= \frac{1}{w_i}(Y_{ij} - c_i)$$

A least-squares minimization procedure is then used to obtain estimates of $\{Z_j\}_{j=1}^{J}$ and $\{w_i,c_i\}_{i=1}^{I}$.

We begin by reestimating the earlier results using the 1980 Liberal-Conservative scale with the Aldrich-McKelvey estimator. While the \code{aldmck} function accepts nearly identical arguments the \code{blackbox\_transpose}, one notable difference appears by default.  \code{aldmck} also accepts a column in the data matrix, specified by the \code{respondent} argument, that specifies the respondent's self placement on the issue scale. The reported respondent rating is then transformed into an ideology score by applying the respondent's personal stretch and bias parameters to that score, with the results shown in Figure 6. Note that the results largely correspond to those shown earlier with \code{blackbox\_transpose}.

<<aldmck>>=
data("LC1980")
result <- aldmck(data = LC1980, polarity = 2, respondent = 1,
    missing = c(0, 8, 9), verbose = TRUE)
summary(result)
@

In addition, this summary shows that the Aldrich-McKelvey function also identifies a number of individuals with negative weights.  These represent the set of individuals who see the space ``backwards'' (i.e. they see Reagan and the Republicans to the left of Carter and the Democrats) \footnote{see \citet{palfrey1987relationship} for a discussion of the weights derived from Aldrich-McKelvey scaling}.  

\begin{figure}
\begin{center}
<<label=aldmck_plot,fig=TRUE,echo=TRUE,width=10,height=10>>=

plot.aldmck(result)
@
\end{center}
\caption{Aldrich-McKelvey plots.}
\label{fig:aldmck_plot}
\end{figure}


Estimation of uncertainty for estimates using Aldrich-McKelvey can be obtained via the non-parametric bootstrap \citep{efron1993introduction}. To simulate 100 samples from the 1980 Liberal-Conservative scales and estimate the standard error of the stimuli, we do the following:

<<aldmck_boot>>=
result <- boot_aldmck(data=LC1980, polarity=2, respondent=1,
    missing=c(0,8,9), iter=100)
apply(result, 2, sd)
@

The Aldrich-McKelvey function can be used to replicate previously published Monte Carlo results from \citet{palfrey1987relationship}. Palfrey and Poole find that the Aldrich-McKelvey algorithm is robust in the presence of heteroskedasticity and test this by replacing the assumed homoskedastic error term $\epsilon_{ij}$ with a respondent-specific $\epsilon_{i}$. In this example we replicate their result in a single trial, and show that the recovered stimuli $\hat{Z_j}$ is almost perfectly correlated with the true $Z_j$ (note that the correlation can be negative because polarity is set randomly).

<<aldmck_MC>>=
Nstimuli <- 6
Nresp <- 500

Z_j <- rnorm(6)
Z_j <- (Z_j - mean(Z_j))/sd(Z_j)

respondent.sd <- runif(Nresp, min = 0.3, max = 0.9)
error_heteroskedastic <- matrix(NA, Nresp, Nstimuli)
for(i in 1:Nresp) error_heteroskedastic <- rnorm(Nstimuli, sd = respondent.sd)

w_i <- runif(Nresp, min=0, max=1)
c_i <- rnorm(Nresp)
Y_ij <- rep(1,500) %o%  Z_j
Y_ij <- Y_ij + error_heteroskedastic
R_ij <-  1/w_i %o% rep(1,Nstimuli) * (Y_ij - c_i %o% rep(1,Nstimuli))

result <- aldmck(R_ij, polarity = 6, missing = c(999))

cor(Z_j, result$stimuli)
@

Although we only show one trial in this paper, the result shown here is reproducible over multiple simulations.\footnote{Replication of the Monte Carlo test with separate groups of informed and uniformed individuals, from \citet[pg. 515]{palfrey1987relationship}, can be conducted by simply changing the error deviations in the code above to have 250 respondents with $\sigma_i=0.3$ and 250 respondents with $\sigma_i=0.9$.}


\section{Conclusion}

The \code{basicspace} package includes a number of functions that enable the estimation of a basic space using self-placement and/or perceptual survey data in R.  These include the following functions:

\textbf{Estimation functions}: \code{aldmck}, \code{blackbox}, \code{blackbox\_transpose}

\textbf{Convenience extraction functions}: \code{individuals}, \code{fit}, \code{stimuli}

\textbf{Generic functions}: \code{predict}, \code{plot}, \code{summary}

\textbf{Bootstrap functions}: \code{boot_aldmck}, \code{boot_blackbt}, and \code{plot} functions for these objects

In addition to the functions listed above, three example data sets have also been included. These include the 1980 NES Issue scales (\code{Issues1980}), the 1980 NES Liberal-Conservative Scales (\code{LC1980}), and the 2004 PELA Liberal-Conservative scales (\code{columbia}).

Social scientists often wish to infer the locations of survey respondents---such as voters or legislators---in an abstract policy or ideological space.  The basic space technique described here has broad applicability to perceptual data. Given the abundance of perceptual data questions found in most social science surveys, there will continue to be numerous potential applications of the estimators included with this package. An \proglang{R} package that facilitates the analysis of perceptual data in a popular statistics environment will enable broader use of these methods.


\section{Acknowledgments}

This research was supported by a grant from the National Science Foundation (NSF-SBS-0611974). James Lo also acknowledges support from SFB 884, ``Political Economy of Reforms''.

\newpage

\bibliography{basicspace}

\end{document}