RNAseq

Downstream of bulk RNAseq: read in salmon output using tximport and then DESeq2

Join my newsletter to not miss a post like this In the last blog post, I showed you how to use salmon to get counts from fastq files downloaded from GEO. In this post, I am going to show you how to read in the .sf salmon quantification file into R; how to get the tx2gene.txt file and do DESeq2 for differential gene expression analysis. Let’s dive in! library(tximport) library(dplyr) library(ggplot2) files<- list.

How to preprocess GEO bulk RNAseq data with salmon

Install fastq-dl To easily download fastq from GEO or ENA, use fastq-dl Assume you already have conda installed, do the following: conda config –add channels conda-forge conda config –add channels bioconda conda create -n fastq_download -c conda-forge -c bioconda fastq-dl conda activate fastq_download Tip: use mamba if conda is too slow for you. They are all big snakes!! We will use bulk RNAseq data from this GEO accession ID: https://www.

How to convert raw counts to TPM for TCGA data and make a heatmap across cancer types

Sign up for my newsletter to not miss a post like this https://divingintogeneticsandgenomics.ck.page/newsletter The Cancer Genome Atlas (TCGA) project is probably one of the most well-known large-scale cancer sequencing project. It sequenced ~10,000 treatment-naive tumors across 33 cancer types. Different data including whole-exome, whole-genome, copy-number (SNP array), bulk RNAseq, protein expression (Reverse-Phase Protein Array), DNA methylation are available. TCGA is a very successful large sequencing project. I highly recommend learning from the organization of it.