Bioinformatics

How to create a GenomicRanges object in Bioconductor using canonical transcripts

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Introduction to Annotation Data Packages in Bioconductor Accurate gene and transcript annotation is the foundation of many bioinformatics workflows, including RNA-seq analysis, functional genomics, and variant annotation. In the R/Bioconductor ecosystem, dedicated annotation data packages make it easy for researchers to access, query, and leverage gene models sourced from major biological databases.

Mastering Bioinformatics in the Age of AI: Foundational Skills for the Modern Scientist

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. AI is transforming every field — and bioinformatics is no exception. From designing drug molecules in minutes to writing entire pipelines, generative AI is making it faster than ever to process biological data. But here’s the truth: AI doesn’t understand biology — you do. That’s why, in this new era, your value isn’t replaced by AI — it’s multiplied by your ability to judge, validate, and improve what AI produces.

How Cancer Drugs Really Work

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Suna was 28. She had melanoma. Chemo left her wrecked—her hair gone, her strength gone, and her hope fading. Doctors gave her weeks. Then they tried something different. It didn’t poison the tumor. It didn’t cut or burn. It woke up her immune system. Her own T-cells found the cancer. Attacked it. Killed it.

How to calculate partial correlation controlling cancer types

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. What is partial correlation Partial correlation measures the relationship between two variables while controlling for the effect of one or more other variables. Suppose you want to know how X and Y are related, independent of how both are influenced by Z. Partial correlation helps answer: If we remove the influence of Z, is there still a connection between X and Y?

Multi-Omics Integration Strategy and Deep Diving into MOFA2

body { text-align: justify} Today’s guest blog post on multiOmics integration is written by Aditi Qamra and edited by Tommy. If you want to do a guest posting in my blog which gets 30k views per month, feel free to contact me on LinkedIn. Aditi is a senior data scientist working on biomarker discovery and early product development at Roche, using multimodal clinical and genomic data. She has a PhD and postdoc in epigenomics of solid tumors and enjoys upskilling herself in stats topics.

How I Would Learn Bioinformatics From Scratch 12 Years Later: A Roadmap

You Can Change Your Appetites Linear algebra, statistics, machine learning—these used to feel abstract to me. I had zero experience of bioinformatics when I was studying my PhD in a wet lab. I memorized formulas without truly understanding them. But over time, I found the right resources that made these concepts click, especially in the context of bioinformatics. I wrote a blog post: My opinionated selection of books/urls for bioinformatics/data science curriculum six years ago, and many links are broken.

PCA analysis on TCGA bulk RNAseq data continued

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. In my last blog post, I showed you how to download TCGA RNAseq count data and do PCA and make a heatmap. It is interesting to see some of the LUSC samples mix with the LUAD samples and vice versa. In this post, we will continue to use PCA to do more Exploratory data analysis (EDA).

PCA analysis on scATACseq data

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. In my last post, I showed you how to use PCA for bulk RNAseq data. Today, let’s see how we can use it for scATACseq data. Download the example dataset from 10x genomics https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_pbmc_5k_v1 The dataset is 5k Peripheral blood mononuclear cells (PBMCs) from a healthy donor (v1.0). Download the atac_pbmc_5k_v1_filtered_peak_bc_matrix.tar.gz file and unzip it.

PCA analysis on TCGA bulk RNAseq data

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. what is PCA? Principal Component Analysis (PCA) is a mathematical technique used to reduce the dimensionality of large datasets while preserving the most important patterns in the data. It transforms the original high-dimensional data into a smaller set of new variables called principal components (PCs), which capture the most variation in the data.

Biotech Data Strategy: Building a Scalable Foundation for Startups

In a biotech startup, an early data strategy is key to ensure public and private data remain useful and valuable. As AI hype reaches new heights, I want to emphasize that a data strategy must precede any AI strategy. Data is the oil of the AI engine. Unfortunately, the real-world data are usually messy and not AI-ready. Without a robust data strategy, you are building an AI system on a shaky foundation.