Genomics

Compute averages/sums on GRanges or equal length bins

Googling is a required technique for programmers. Once I have a programming problem in mind, the first thing I do is to google to see if other people have encountered the same problem and maybe they already have a solution. Do not re-invent the wheels. Actually, reading other people’s code and mimicing their code is a great way of learning. Today, I am going to show you how to compute binned averages/sums along a genome or any genomic regions of interest.

How to upload files to GEO

readings links: http://yeolab.github.io/onboarding/geo.html http://www.hildeschjerven.net/Protocols/Submission_of_HighSeq_data_to_GEO.pdf https://www.ncbi.nlm.nih.gov/geo/info/submissionftp.html 1. create account Go to NCBI GEO: http://www.ncbi.nlm.nih.gov/geo/ Create User ID and password. my username is research_guru I used my google account. 2. fill in the xls sheet Downloaded the meta xls sheet from https://www.ncbi.nlm.nih.gov/geo/info/seq.html

bgzip the fastqs cd 01seq find *fastq | parallel bgzip md5sum *fastq.gz > fastq_md5.txt # copy to excle cat fastq_md5.txt | awk ‘{print $2}’ #copy to excle cat fastq_md5.