Rstats

Part 4 CITE-seq normalization using empty droplets with the dsb package

In this post, we are going to try a CITE-seq normalization method called dsb published in Normalizing and denoising protein expression data from droplet-based single cell profiling two major components of protein expression noise in droplet-based single cell experiments: (1) protein-specific noise originating from ambient, unbound antibody encapsulated in droplets that can be accurately estimated via the level of “ambient” ADT counts in empty droplets, and (2) droplet/cell-specific noise revealed via the shared variance component associated with isotype antibody controls and background protein counts in each cell.

Generative AI: Text generation using Long short-term memory (LSTM) model

In the world of deep learning, generating sequence data is a fundamental task. Typically, this involves training a network, often an RNN (Recurrent Neural Network) or a convnet (Convolutional Neural Network), to predict the next token or a sequence of tokens in a given sequence, using the preceding tokens as input. For example, when provided with the input “the cat is on the ma,” the network’s objective is to predict the next character, such as ‘t.

use random forest and boost trees to find marker genes in scRNAseq data

This is a blog post for a series of posts on marker gene identification using machine learning methods. Read the previous posts: logistic regression and partial least square regression. This blog post will explore the tree based method: random forest and boost trees (gradient boost tree/XGboost). I highly recommend going through https://app.learney.me/maps/StatQuest for related sections by Josh Starmer. Note, all the tree based methods can be used to do both classification and regression.

compare slopes in linear regression

I asked this question on twitter. load the package library(tidyverse) make some dummy data The dummy example: We have two groups of samples: disease and health. We treat those cells in vitro with different dosages (0, 1, 5) of a chemical X and count the cell number after 3 hours. x <- tibble( '0' = c(8.66, 11.50, 7.01, 13.40, 11.30, 8.13, 5.92, 7.54), '1' = c(22.10, 23.00, 22.00, 35.70, 32.

Monty Hall problem- a peek through simulation

I am taking this STATE-80 course from Harvard Extension School. This course teaches commonly used distributions and probability theory. The instructor Hatch is a really good teacher and he uses simulation for all the demonstrations along with the formulas. In week 6, we revisited the Monty Hall problem which we played on the first day of class. If you have not heard about it, I quoted from the wiki: Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats.

Align multiple ggplot2 plots by axis

I used to use cowplot to align multiple ggplot2 plots but when the x-axis are of different ranges, some extra work is needed to align the axis as well. The other day I was reading a blog post by GuangChuang Yu and he exactly tackled this problem. His packages such as ChIPseeker, ClusterProfiler, ggtree are quite popular among the users. Some dummy example from his post: library(dplyr) library(ggplot2) library(ggstance) library(cowplot) # devtools::install_github("YuLab-SMU/treeio") # devtools::install_github("YuLab-SMU/ggtree") library(tidytree) library(ggtree) no_legend=theme(legend.