We are diploids. What that means is that we have two copies of every autosomal gene, where one copy is from Mother and the other copy is from Dad. Thanks to Next-Gen sequencing technology, our resolution in understanding gene expression has drastically changed. With RNA-seq, now we can study whether mom’s or dad’s copy of a gene is expressed (if the gene has one or more genetic variations between mom and dad’s copies) in bulk samples.
If a gene shows preferential expression of one copy (or allele) of a gene over the other then it shows “Allele-Specific Expression” (ASE). What we know so far is that, the allele-specific expression prevalent across the genome, when studied in bulk, i.e. mix of thousands/million of cells from a tissue or cell lines. See the following two recent publications using large-scale population-level RNA-seq studies
- Transcriptome and genome sequencing uncovers functional variation in humans by Lappalainen et. al.
- Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals, by Battle et. al.
Although we see allele-specific expression a lot, a preference of one copy/allele completely over the other is rare, reserved only for a handful of imprinted autosomal genes (and X).
Gene Expression at Single Cell Resolution
An interesting question is what happens if you look inside a single cell? Will we see the same pattern as in bulk? A very intriguing idea is that only one allele is expressed at single cell level. Thanks to single cell genomic technologies, now we can sequence samples from a single cell and ask interesting questions.
A recent paper in Science has exactly done that. The Science paper titled
The team studied allele-specific gene expression using single cell RNA-seq of embryos from F1 hybrid mouse strains between CAST and B6. The two inbred mouse strains have a lots of genetic variation between them and over 80% of genes have at least one variation.
If you are not new to RNA-seq, B6xCAST F1 mice might ring a bell. These hybrids are the same ones that kind of cheated us on the number of imprinted genes and highlighted the reference genome alignment biases in doing RNA-Seq analysis. Since then, our knowledge about genetic variations in these strains and the methods to perform allele-specific expression analysis have improved. And this work has made use of all that.
Given that Single-Cell RNA-seq is relatively a new technology and the claim that mono-allelic expression at single cell is big, the authors seem to have done a lot of quality control to make sure the results that they see is not an artifact (at least in the first pass at reading).
The team used over 35 embryos from F1 hybrids to obtain 269 single-cell samples under multiple developmental stages. The RNA-Seq data for all the samples from multiple developmental stages were obtained from either Smart-seq or Smart-Seq2 protocols.
Allele-specific gene expression estimates were computed from reads from F1 hybrids using SNP aware GSNAP alignments to genome using only uniquely mapping reads. The authors also used STAR aligner and alignment to B6 and CAST genomes separately to make sure the alignments look right. In addition, they also used principal component analysis to make sure the samples from similar/same developmental stage clusters together.
Huge Technical Bias in Calling Monoallelic Gene Expression
Even after making sure the results were not affected by alignment artifacts, initially, they found that sample preparation had a major role in inflating the number of genes showing monoallelic expression. Basically, over 50% all genes with genetic variation showed monoallelic expression. The single most reason for the high monoalleleic expression is that
single-cell transcriptome methods suffer from stochastic losses of RNA species, it was necessary to determine to what extent random sampling effects inflate observed monoallelic calls.
By splitting the lysate from single cell samples equally and sequencing the split-pairs, they found that the stochastic loss of RNA molecules led to a false-positive rate of whopping 66% while calling whether a gene shows monoallelic expression. For further analysis for identifying genes showing mono-allelic expression, only the genes with high abundances were considered as they showed stable estimates in the control experiments.
In over 200 RNA-seq samples from multiple development stages (late 2-cell, 4-cell,,8-cell, 16-cell, and early, mid, and late-blastocyst), they found that 12-24% of genes showed mono-allelic expression (Fig. 3 D). Even within the same developmental stage , the percent of genes showing monoalleic expression is highly variable.
In the early developmental stages (2-cell, 4-cell,8-cell), the percent of maternal mono-allelic genes were higher than paternal monoallelic genes. However, in the 16-cell and all blastocyst stages, there seem to be no preference of maternal or paternal monoallelic expression. It is also interesting that the monoallelic expression seen at single cell level almost vanished at embryo levels (See Figure 3 E).
Monoallelic Expression or Transcriptional Bursting
Check the interesting discussion on twitter, whether to call snap-shot observation of a total preference of one copy over the other as mono-allelic expression. It seems the term “mono-allelic expression” means that it is a stable expression of one allele over the other. The other term already in the literature for random, dynamic expression of one allele is “transcriptional bursting”.
It is also worth pointing out that, although the authors use term monoalleleic expression, they do say in the conclusion it is not stable but similar to transcriptional bursting.
In this study, we uncovered a stochastic pattern of monoallelic expression that differs from the stable allelic regulation of genomic imprinting and allelic exclusion. It also differs from the stably maintained monoallelic expression observed in clonal lymphoid cell populations. Instead, the rapid expression dynamics that we uncovered in individual cells are consistent with models of transcriptional bursting. In each cell, independent bursts of transcription occur from both alleles over time, but RNA from only one allele is often present at any given time.
Here is the Storified link to the twitter discussion.
Check the nice summary by Melissa Wilson Sayres at mathbionerd blog mainly focusing X-inactivation.