How I Would Learn Bioinformatics From Scratch 12 Years Later: A Roadmap

You Can Change Your Appetites

Linear algebra, statistics, machine learning—these used to feel abstract to me. I had zero experience of bioinformatics when I was studying my PhD in a wet lab.

I memorized formulas without truly understanding them. But over time, I found the right resources that made these concepts click, especially in the context of bioinformatics.

I wrote a blog post: My opinionated selection of books/urls for bioinformatics/data science curriculum six years ago, and many links are broken. so I decided to write a new one.

If I were starting my bioinformatics/computational journey again 12 years ago, here are the FREE resources I would recommend.


1. Master the Linux Command Line

Knowing how to work in a Unix environment is a must for any bioinformatician. Start with these:


2. Learn R for Genomics

R is an essential tool for bioinformatics, especially for data wrangling and visualization.


3. Build a Strong Statistical Foundation

Understanding statistics is critical. These books and videos will help:


4. Linear Algebra: Make It Click

I never understood eigenvectors and eigenvalues—until I found these:

Why it is important to learn linear algebra? Most of the genomics data are just matrices:

  • an RNAseq expression matrix is a gene-by-sample matrix, with entries to be read counts for each gene
  • a single-cell expression matrix is a gene-by-cell matrix, with entries to be read counts for each gene
  • a ChIP-seq count matrix is a peak-by-sample matrix, with entries to be the number of reads in each peak
  • a drug response matrix is a drug-by-sample matrix, with entries to be IC50 for example

and many more… in other words,

Matrix is EVERYWHERE for bioinformatics (and many other data science topics)!

Many of the bioinformatics problems can be rephrased as matrix manipulation.

Understand what does matrix multiplication mean deeply; Why matrix factorization is useful for genomics (see my post).

Matrix calculation is also the foundation of deep learning!


5. Get Comfortable with Machine Learning

Statistics and machine learning go hand in hand:


6. Python for Bioinformatics

I’m primarily an R user, but I use Python for workflow automation. If I had to start again:


Just Start!

Pick any resource that fits your learning stage and dive in. Waiting won’t change anything. Taking action will. Those who start and experiment win in bioinformatics.

of course, subscribe to my youtube channel chatomics to learn bioinformatics too! https://www.youtube.com/@chatomics

If you found this useful, share it with others who might benefit. Happy learning! 🚀

Related

Previous
comments powered by Disqus