Understanding how the naturally occurring genetic variations affect gene expression levels has been a promising first step to understand the genetics of complex traits at molecular level. The expression Quantitative Trait Loci studies (eQTL) attempt map all the genomic regions affecting/associated with gene expression levels, by genotyping and measuring genome wide expression levels on the same set of individuals from a population.
Although sequencing mRNA molecules by RNA-seq technology has almost replaced microarray technology in small/medium scale gene expression studies, the large-scale genetics of gene expression studies have been a bit slow to embrace Next-gen sequencing. Till recently almost all of the eQTL studies have primarily relied on measuring expression profiles by using microarray technologies.
Only two studies have been published that looked at at genetics of human gene expression by RNA-seq. Pritchard’s group at U. Chicago characterized the genetics of gene expression in 69 african samples from HapMap using RNA-seq and a group from Europe analyzed RNA-seq from 60 European HapMap samples. The results from these studies were published back-to-back in the Nature 2010 issue celebrating “The human genome at ten“.
After those two publications, there are no studies using RNA-seq for understanding genetics of gene expression. Actually, except for the ENCODE/MODENCODE project there is no large scale RNA-seq data published yet. ENCODE published over 410 RNA-seq studies, they were mainly from a few cell lines, not a population level data.
It is bit surprising to see that genetics of gene expression has been a bit slow to embrace sequencing. For sure there are many valid reasons, including funding, scale of such projects and challenges associated with collecting genotype, expression and possibly other phenotypes from the same population and challenges associated with analyzing RNA-seq data.
Genetics of Gene Expression Goes Next-Gen Sequencing
It is all going to change pretty soon. The recently concluded Biology of the Genomes conference highlighted at least two major studies on the genetics of human gene expression using RNA-seq data.
One study, Geuvadis project from Europe uses part of samples from the 1000 Genome project with the aim of setting up standards for biological/medical interpretation of sequence data in relation to clinical phenotypes. The Geuvadis project has sequenced both mRNA and microRNA molecules on 465 lymphoblastoid cell line (LCL) samples from 5 populations of the 1000 Genomes Project: one african (Yoruba, YRI) and four european (CEPH (CEU), Finns (FIN), British (GBR), and Toscani (TSI)).
Of these samples, 423 were part of the 1000 Genomes Phase 1 dataset with genome/exome sequencing data. Although the project results are not published yet, the underlying RNA-seq and microRNA-seq data are available from Geuvadis, under Fort Lauderdale Agreement. The Geuvadis website also says that one can expect the publication describing the results of the project pretty soon. Till then here are the tweets on talk at #BOG13 from the leading author of Geuvadis project.
The second study from Stanford group led by Alexis Battle has sequenced transcriptomes for 922 individuals from the same population and genotyped them for 737,187 common SNPs. With over 900 individuals, it is the largest transcriptome sequencing effort ever. The mRNA from whole blood was sequenced to really high depth of over 60 million reads in each individual. This work does not have a project page yet and one can learn a bit. Here is the tweets from #BOG13 talk of Alexis Battle storified.
NIH’s Genotype-Tissue Expression (GTEx) Project
These two large scale population level RNA-seq effort are only a beginning. NIH’s Genotype-Tissue Expression (GTEx) project is underway to create a largest resource of genotype and gene expression by RNA-seq on 30 to 50 tissues in the human body, including the brain, lung, heart and muscle. Just on the scale alone, the GTEx project could be called as “popENCODE” :) (Over 4000 experiments in ENCODE: 2142: ChIP-seq, 418:RNA-seq,318-DNase-seq). And the GTEx pilot project’s final data on 190 individuals with genotype information and RNA-seq on over 1800 tissues is available now. GTEx project plans to scale up the sample size in the future.