How to test the performance of batch effect correction algorithms? We apply three popular batch effect correction workflows to scRNA-Seq libraries from three different donors, with batch effects detailed in Figure 2 and the vignette titled “Sample Donor Effects”. We assess the performance of each method using CMS and iLISI scoring. The full results of this test are described in our published Figure 4 and accompanying text.
How are cells classified in a single sample? Single-sample cell classification is a key part of generating a Cell Misclassification Statistic. This vignette describes exactly how cells from a single sample/dataset are categorized as part of a multi-dataset workflow.
Data normalization and merging strategies differentially impact the measured batch effect? We measure the CMS and iLISI scores resulting from two data normalization/scaling methods (log-normalization + scaling vs. SCTransform). We perform each method twice - either on individual samples (prior to merging) or on a joint object of all samples (after merging). The results of this analysis are presented in greater detail in Figure 5 of our manuscript.
How to determine if pooling samples to sequence simultaneously can improve sequencing? scRNA-Seq libraries from two different donors, sequenced to similar depth (Samples 4-A, 4-B and 5-A) were either sequenced at the same time (“sequenced together” samples 4-A & 5-A) or at different times (“sequenced separately” samples 4-B & 5-A). The full results of this test are described in our published Figure 3.
How to determine the batch effects introduced by sample donor? scRNA-Seq libraries from independently processed samples may be combined to produce a single analysis. This vignette describes how we determine the batch effects of three identically prepared and sequenced PBMC datasets, using CMS and iLISI scoring. The results of this workflow are depicted in figure 2A.
How to determine the batch effects introduced by sequencing replicate and depth? scRNA-Seq libraries generated from two independently processed PBMC samples were sequenced in duplicate (for a total of four sequencing outputs from two PBMC samples). PBMC sample 4 was sequenced twice without altering any parameters, while PBMC sample 5 was sequenced twice to a different depth of sequence. This vignette describes how we use CMS and iLISI scoring to determine the batch effects occuring from identical sequencing at different times, and from sequencing depth. The sample duplicating scheme is depicted in figure 2B, while results of this workflow are depicted in figures 2C-D.