Bioinformatics

How PCA projection and cell label transfer work in Seurat

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Understand the example datasets We will use PBMC3k and PBMC10k data. We will project the PBMC3k data to the PBMC10k data and get the labels library(Seurat) library(Matrix) library(irlba) # For PCA library(RcppAnnoy) # For fast nearest neighbor search library(dplyr) # Assuming the PBMC datasets (3k and 10k) are already normalized # and represented as sparse matrices # devtools::install_github('satijalab/seurat-data') library(SeuratData) #AvailableData() #InstallData("pbmc3k") pbmc3k<-UpdateSeuratObject(pbmc3k) pbmc3k@meta.

You need to master it if you deal with genomics data

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Motivation What’s the most common problem you need to solve when dealing with genomics data? For me, it is Genomic Intervals! The genomics data usually represents linearly: chromosome name, start and end. We use it to define a region in the genome ( A peak from ChIP-seq data); the location of a gene, a DNA methylation site ( a single point), a mutation call (a single point), and a duplication region in cancer etc.

A docker image to keep this site alive

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. I have been writing blog posts for over 10 years. I was using blogspot and in 2018, I switched to blogdown and I love it. My blogdown website divingintogeneticsandgenomics.com was using Hugo v0.42 and blogdown v1.0. It has been many years and now I have a macbook pro with an M3 chip. I could not install the old versions of the R packages to serve the site.

The Most Common Mistake In Bioinformatics, one-off error

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. In my last blog post, I talked about some common bioinformatics mistakes. Today, we are going to talk about THE MOST common bioinformatics mistake people make. And I think it deserves a separate post about it. Even some experienced programmers get it wrong and the mistake prevails in many bioinformatics software: The one-off mistake!

The Most Common Stupid Mistakes In Bioinformatics

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. This post is inspired by this popular thread in https://www.biostars.org/. Common mistakes in general Off-by-One Errors: Mistakes occur when switching between different indexing systems. For example, BED files are 0-based while GFF/GTF files are 1-based, leading to potential misinterpretations of genomic coordinates. This is one of the most common mistakes!

Six tips to build a strong Bioinformatics CV

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. If you apply for a Bioinformatics position, hundreds of CVs get to sent to the hiring manager. How to stand out among all of them? Below are 6 tips from my hiring experience: Include a GitHub Link: Ensure your CV has a GitHub link with relevant content like Python or R packages, data analysis projects, or replicated figures from published papers.

R or Python for Bioinformatics?

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. R or Python for Bioinformatics? Watch the video here: If you need to pick Python or R for bioinformatics, which one should you choose? This is a decades-old question from many beginners. This is my story. I started learning Unix Commands 12 years ago (See an example of how powerful Unix commands can be).

How to level up Real-life bioinformatics skill: from dealing with one sample to a lot of samples

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. The other day, I saw this tweet: Machine learning and bioinformatics tutorials these days pic.twitter.com/0FhWWG09TB — Ramon Massoni Badosa (@rmassonix) May 15, 2024 Many of the bioinformatics tutorials are like that. I am not saying the tutorial is not good. For beginners, we need something basic first to understand it.

Common mistakes when analyzing single-cell RNAseq data

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. I recently was interviewed by the SEQanswers forum on single-cell RNAseq analysis. In your opinion, what is the most challenging aspect of single-cell analysis? Every single-cell dataset is unique in terms of data quality and QC has to be carried out in a dataset specific manner. Cell annotation is still one of the most challenging steps.

How to separate a comma delimited string into multiple lines in R and python

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. The problem df<- data.frame(id = c(1,2,3), value = c('x,y', 'z,w', 'a')) df #> id value #> 1 1 x,y #> 2 2 z,w #> 3 3 a we want to put x,y in the first row into two rows: 1, x 1, y and put z,w into two rows too. solution with R There is a neat function separate_rows that does exactly this in tidyr package: