Large Scale Sequencing Efforts Go Beyond Genetic Variations Towards Function

In a span of a few weeks, three large scale sequencing projects with the theme of moving beyond genetic variations towards function, were published interesting papers. Two of the projects are not completely new to the ones who are regular to the big genomic conferences. We earlier covered it here.

This is a quick post mentioning the three projects and would write a detailed summary posts as soon as finish reading them.

The first paper published in Nature

used large scale RNA-seq and microRNA-seq of about 460 individuals who were part of the 1000 Genomes project and characterized how millions of genetic variations affect gene expression variations.  The 462 individuals covering five populations: the CEPH, Finns, British, Toscani  and Yoruba have both genome and RNA sequencing data. Till yesterday, this Nature paper had the distinction of the largest population level RNA-sequencing effort (It is still the largest sample with three types of sequencing data).  Here is the abstract of the paper (and summary from the lead author of the paper).

Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project—the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

Yesterday, Genome Research published another paper which sequenced 922 individuals

In this project all 922 individuals are of European descent and thus making it the largest transcriptomic study using RNA-seq.  The large sample size enabled to look at how genetic variations influence gene expression variations in both Cis and Trans.  Here is the abstract

Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation – by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing, and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation

The third paper published in Science 

takes on a different challenge under the same theme “from genetic variation to function”. This paper used the data from 1000 Genome project and functional data from ENCODE project with 90 cancer genomes to prioritize cancer driver mutations. Here is the abstract.

Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations (“ultrasensitive”) and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, “motif-breakers”). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.

Stay tuned of review/summary of these three papers pretty soon.


  1. […] Large scale project papers (post by nextgenseek ) I would add the TCGA […]

  2. […] expression of one alleles over the other, but both the alleles are expressed (See results from two largest transcriptome sequencing efforts published recently). Very interestingly, the team found that the allele-specific expression […]

  3. […] of cells from a tissue or cell lines. See the following two recent publications using large-scale population-level RNA-seq […]

Speak Your Mind