--- title: "CellDEEP Quick Start" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{CellDEEP Quick Start} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## What CellDEEP does CellDEEP reduces scRNA-seq sparsity by pooling cells into pseudocells before DE testing. ## Load package and example data ```{r, load_data} library(CellDEEP) data("sim") ``` ## Step 1: Run DE directly with FindMarker.CellDEEP `FindMarker.CellDEEP` includes metadata preparation internally. Key parameters to set: - `group_id`, `sample_id`, `cluster_id`: metadata column names in your Seurat object - `ident.1`, `ident.2`: two groups to compare - `cell_selection`: how to select cells for pooling (`"kmean"` or `"random"`) - `readcounts`: how to aggregate counts in pooled cells (`"sum"` or `"mean"`) - `min_cells_per_subgroup`: minimum cells required in each sample-cluster subgroup for pooling ```{r} de.test <- FindMarker.CellDEEP( sim, group_id = "Status", sample_id = "DonorID", cluster_id = "cluster_id", Pool = TRUE, test.use = "wilcox", n_cells = 3, min_cells_per_subgroup = 1, cell_selection = "random", readcounts = "sum", logfc.threshold = 0.25, ident.1 = "Case", ident.2 = "Control" ) ``` ## Step 2: Pool cells only (optional) Use these functions if you want pooled objects without running DE immediately. `min_cells_per_subgroup` means the minimum number of cells required in each `sample_id x cluster_id` subgroup before pooling is performed. Pooling functions use standardized metadata fields (`sample_id`, `group_id`, `cluster_id`), so prepare once before pooling: ```{r} pool_input <- prepare_data( sim, sample_id = "DonorID", group_id = "Status", cluster_id = "cluster_id" ) ``` ### K-means pooling ```{r} pooled_kmean <- CellDEEP.Kmean( pool_input, readcounts = "sum", n_cells = 3, min_cells_per_subgroup = 1, assay_name = "RNA" ) pooled_kmean ``` ### Random pooling ```{r} pooled_random <- CellDEEP.Random( pool_input, readcounts = "sum", n_cells = 5, min_cells_per_subgroup = 1, assay_name = "RNA" ) pooled_random ``` If no genes pass the adjusted p-value filter in this small example dataset, try a larger dataset or set `full_list = TRUE`.