PCA

How CCA alignment and cell label transfer work in Seurat

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Understand CCA Following my last blog post on PCA projection and cell label transfer, we are going to talk about CCA. In single-cell RNA-seq data integration using Canonical Correlation Analysis (CCA), we typically align two matrices representing different datasets, where both datasets have the same set of genes but different numbers of cells.

How PCA projection and cell label transfer work in Seurat

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Understand the example datasets We will use PBMC3k and PBMC10k data. We will project the PBMC3k data to the PBMC10k data and get the labels library(Seurat) library(Matrix) library(irlba) # For PCA library(RcppAnnoy) # For fast nearest neighbor search library(dplyr) # Assuming the PBMC datasets (3k and 10k) are already normalized # and represented as sparse matrices # devtools::install_github('satijalab/seurat-data') library(SeuratData) #AvailableData() #InstallData("pbmc3k") pbmc3k<-UpdateSeuratObject(pbmc3k) pbmc3k@meta.

permutation test for PCA components

PCA is a critical method for dimension reduction for high-dimensional data. High-dimensional data are data with features (p) a lot more than observations (n). However, this is changing with single-cell RNAseq data. Now, we can sequence millions (n) of single cells and each cell has ~20,000 genes/features (p). I suggest you read my previous blog post on using svd to calculate PCs. Single-cell expression data PCA In single-cell RNAseq analysis, feature selection will be performed first.

PCA in action

PCA in practice. Principal Component Analysis(PCA) is a very important skill for dimention reduction to analyze high-dimentional data. High-dimentional data are data with features (p) a lot more than observations (n). This types of data are very commonly generated from high-throuput sequencing experiments. For example, an RNA-seq or microarry experiment measures expression of tens of thousands of genes for only 8 samples (4 controls and 4 treatments). Let’s use a microarray data for demonstration.