
10 single-cell data benchmarking papers

I tweeted it at I got asked to put all my posts in a central place and I think it is a good idea. And here it is! Benchmarking integration of single-cell differential expression Benchmarking atlas-level data integration in single-cell genomics A review of computational strategies for denoising and imputation of single-cell transcriptomic data Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution

Has AI changed the course of Drug Development?

What’s the drug development process? Has AI changed the course of Drug Development? To answer this question, we need first to understand the drug development process. The whole process includes the following: target identification target pharmacology and biomarker development lead identification, lead optimization Clinical research & development regulatory review of IND (investigational new drug) and later phase clinical trials post-marketing knowledge Biologics/antibodies drug development follows a similar path (you can find the map in the same link).

How to add boxplots or density plots side-by-side a scatterplot: a single cell case study

introduce ggside using single cell data The ggside R package provides a new way to visualize data by combining the flexibility of ggplot2 with the power of side-by-side plots. We will use a single cell dataset to demonstrate its usage. ggside allows users to create side-by-side plots of multiple variables, such as gene expression, cell type, and experimental conditions. This can be helpful for identifying patterns and trends in scRNA-seq data that would be difficult to see in individual plots.

Unlock the Power of Genomics Data Analysis: Watershed's Seamless Cloud Computing Solution

Disclaimer: This post is sponsored by Watershed Omics Bench platform. I have personally tested the platform. The opinions and views expressed in this post are solely those of the author and do not represent the views of my employer As an experienced bioinformatician who understands the needs of biotech startups, I know the challenges that arise when analyzing genomics data. The first solution that comes to mind is cloud computing. Unsurprisingly, AWS and Google Cloud Platform (GCP) are commonly used options.

How to do neighborhood/cellular niches analysis with spatial transcriptome data

read in the data and pre-process library(Seurat) library(here) library(ggplot2) library(dplyr) # the LoadVizgen function requires the raw segmentation files which is too big.

How to construct a spatial object in Seurat

Single-cell spatial transcriptome data is a new and advanced technology that combines the study of individual cells' genes and their location in a tissue to understand the complex cellular and molecular differences within it. This allows scientists to investigate how genes are expressed and how cells interact with each other with much greater detail than before.

Deep learning to predict cancer from healthy controls using TCRseq data

The T-cell receptor (TCR) is a special molecule found on the surface of a type of immune cell called a T-cell. Think of T-cells like soldiers in your body's defense system that help identify and attack foreign invaders like viruses and bacteria. The TCR is like a sensor or antenna that allows T-cells to recognize specific targets, kind of like how a key fits into a lock.

How to deal with overplotting without being fooled

The problem Let me be clear, when you have gazillions of data points in a scatter plot, you want to deal with the overplotting to avoid drawing misleading conclusions. Let's start with a single-cell example. Load the libraries: library(dplyr) library(Seurat) library(patchwork) library(ggplot2) library(ComplexHeatmap) library(SeuratData) set.seed(1234) prepare the data data("pbmc3k") pbmc3k #> An object of class Seurat #> 13714 features across 2700 samples within 1 assay #> Active assay: RNA (13714 features, 0 variable features) ## routine processing pbmc3k<- pbmc3k %>% NormalizeData(normalization.

How to do gene correlation for single-cell RNAseq data (part 1)

Load libraries library(dplyr) library(Seurat) library(patchwork) library(ggplot2) library(ComplexHeatmap) library(SeuratData) set.seed(1234) prepare the data data("pbmc3k") pbmc3k #> An object of class Seurat #> 13714 features across 2700 samples within 1 assay #> Active assay: RNA (13714 features, 0 variable features) ## routine processing pbmc3k<- pbmc3k %>% NormalizeData(normalization.method = "LogNormalize", scale.factor = 10000) %>% FindVariableFeatures(selection.method = "vst", nfeatures = 2000) %>% ScaleData() %>% RunPCA(verbose = FALSE) %>% FindNeighbors(dims = 1:10, verbose = FALSE) %>% FindClusters(resolution = 0.

transpose single-cell cell x gene dataframe to gene x cell

Single cell matrix is often represented as gene x cell in R/Seurat, but it is represented as cell x gene in python/scanpy. Let’s use a real example to show how to transpose between the two formats. The GEO accession page is at Download the data We can use command line to download the count matrix at ftp: wget -O ~/blog_data/GSE154763_ESCA_normalized_expression.csv.gz # decompress the file gunzip GSE154763_ESCA_normalized_expression.csv.gz # this GEO matrix is cell x gene # take a look by https://www.