To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics.
In my last blog post, I talked about some common bioinformatics mistakes.
Today, we are going to talk about THE MOST common bioinformatics mistake people make. And I think it deserves a separate post about it. Even some experienced programmers get it wrong and the mistake prevails in many bioinformatics software:
The one-off mistake!
To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics.
This post is inspired by this popular thread in https://www.biostars.org/.
Common mistakes in general Off-by-One Errors: Mistakes occur when switching between different indexing systems. For example, BED files are 0-based while GFF/GTF files are 1-based, leading to potential misinterpretations of genomic coordinates. This is one of the most common mistakes!
To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics.
If you apply for a Bioinformatics position, hundreds of CVs get to sent to the hiring manager. How to stand out among all of them? Below are 6 tips from my hiring experience:
Include a GitHub Link: Ensure your CV has a GitHub link with relevant content like Python or R packages, data analysis projects, or replicated figures from published papers.
To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics.
R or Python for Bioinformatics? Watch the video here:
If you need to pick Python or R for bioinformatics, which one should you choose? This is a decades-old question from many beginners.
This is my story.
I started learning Unix Commands 12 years ago (See an example of how powerful Unix commands can be).
To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics.
The other day, I saw this tweet:
Machine learning and bioinformatics tutorials these days pic.twitter.com/0FhWWG09TB — Ramon Massoni Badosa (@rmassonix) May 15, 2024 Many of the bioinformatics tutorials are like that. I am not saying the tutorial is not good. For beginners, we need something basic first to understand it.
To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics.
I recently was interviewed by the SEQanswers forum on single-cell RNAseq analysis.
In your opinion, what is the most challenging aspect of single-cell analysis? Every single-cell dataset is unique in terms of data quality and QC has to be carried out in a dataset specific manner. Cell annotation is still one of the most challenging steps.
To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics.
The problem df<- data.frame(id = c(1,2,3), value = c('x,y', 'z,w', 'a')) df #> id value #> 1 1 x,y #> 2 2 z,w #> 3 3 a we want to put x,y in the first row into two rows:
1, x
1, y
and put z,w into two rows too.
solution with R There is a neat function separate_rows that does exactly this in tidyr package:
I was asked this question very often: “Tommy, what’s the p-value cutoff should I use to determine the differentially expressed genes; what log2 Fold change cutoff should I use too?”
For single-cell RNAseq quality control, what’s the cutoff for mitochondrial content?
My answer is always: it depends. I was joking: determining a cutoff is 90% of the work a bioinformatician does.
Why is that?
Biology is more than just statistics.
Context and Problem In scRNA-seq, each cell is sequenced individually, allowing for the analysis of gene expression at the single-cell level. This provides a wealth of information about the cellular identities and states. However, the high dimensionality of the data (thousands of genes) and the technical noise in the data can lead to challenges in accurately clustering the cells. Over-clustering is one such challenge, where cells that are biologically similar are clustered into distinct clusters.
Join my newsletter to not miss a post like this
In the last blog post, I showed you how to use salmon to get counts from fastq files downloaded from GEO. In this post, I am going to show you how to read in the .sf salmon quantification file into R; how to get the tx2gene.txt file and do DESeq2 for differential gene expression analysis. Let’s dive in!
library(tximport) library(dplyr) library(ggplot2) files<- list.