%\VignetteIndexEntry{Bimodality Index} %\VignetteKeywords{bimodality} %\VignetteDepends{oompaBase,BimodalIndex} %\VignettePackage{BimodalIndex} \documentclass{article} \usepackage{graphicx} \usepackage{hyperref} \usepackage{cite} \pagestyle{myheadings} \markright{bimodalIndex} \setlength{\topmargin}{0in} \setlength{\textheight}{8in} \setlength{\textwidth}{6.5in} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} \def\rcode#1{\texttt{#1}} \def\fref#1{\textbf{Figure~\ref{#1}}} \def\tref#1{\textbf{Table~\ref{#1}}} \def\sref#1{\textbf{Section~\ref{#1}}} \title{Bimodality Index} \author{Kevin R. Coombes} \begin{document} <>= options(width=88) options(SweaveHooks = list(fig = function() par(bg='white'))) #if (!file.exists("Figures")) dir.create("Figures") @ %\SweaveOpts{prefix.string=Figures/02-AML-27plex, eps=FALSE} \maketitle \tableofcontents \section{Simulated Data} We simulate a dataset. <>= set.seed(564684) nSamples <- 60 nGenes <- 3000 dataset <- matrix(rnorm(nSamples*nGenes), ncol=nSamples, nrow=nGenes) dimnames(dataset) <- list(paste("G", 1:nGenes, sep=''), paste("S", 1:nSamples, sep='')) @ At present, this dataset has no interesting structure; all genes have their expression patterns drawn from a common normal distribution. So, we shift the means by three standard deviations for half the samples for the first 100 genes. <>= dataset[1:100, 1:30] <- dataset[1:100, 1:30] + 3 @ \section{Computing the Bimodal Index} In order to compute the bimodal index from Wang et al. (2009) \url{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2730180}, we must load the package. <>= library(BimodalIndex) @ Now we call the basic function: <>= bim <- bimodalIndex(dataset) summary(bim) @ Here we see a suggestion that at least some of the values are likely to be above a reasonable cutoff to be called significant. \begin{figure} <>= plot(bim$BI, col=rep(c("red", "black"), times=c(100, 2900)), xlab="Gene", ylab="Bimodal Index") @ \caption{Scatter plot of the bimodal indices of all genes.} \label{pltobi} \end{figure} Next, we plot the results, with the known bimodal genes colored red (Figure~\ref{plotbi}). As expected, most (but not all) of the large BI values arise from the known bimodal genes. We can then use the simulations from the null model to estimate reasonable significance cutoffs when using 60 samples. <>= summary(bim$BI[101:3000]) cutoffs <- quantile(bim$BI[101:3000], probs=c(0.90, 0.95, 0.99)) cutoffs @ Now we can assess the sensitivity of the test when using the derived cutoffs. <>= sapply(cutoffs, function(x) sum(bim$BI[1:100] > x)) @ With real data, of course, we would need to determine the significance by simulating a large number of genes from the null model, using the simulations to compute empirical p-values. Because these p-values would still be computed one gene at a time, it would be advisable to incorporate a multiple testing crierion by, for example, estimating the false discovery rate. \section{Appendix} This analysis was performed in the following directory: <>= getwd() @ This analysis was performed in the following software environment: <>= sessionInfo() @ \end{document}