- Domains of genome-wide gene expression dysregulation in Down’s syndrome by Letourneau et al., Nature, 508, 345–350 (17 April 2014).
Briefly, Down Syndrome is the result of an extra copy of Chromosome 21 in human. Instead of having two copies of Chr 21, affected individuals have three copies. This paper presents some remarkable results from a study looking at transcriptomes of Down’s syndrome.
RNA-seq on Identical Twins Discordant for Trisomy 21
Although a number of studies have looked at the effect of extra Chr 21 on gene expression before, for the first time this paper studies gene expression using a controlled/balanced experimental design that happened by chance.
The paper uses RNA-seq to study transcriptome of fetal fibroblasts from identical human twins, where one of the twins is normal and the other is affected by Down’s Syndrome. Monozygotic twin discrepancies for trisomy 21 is really rare and expected to occur in 1/385,000 cases. A few authors from the Nature paper had published earlier on such a rare case in 2008 and this study uses the samples to look at expression differences.
The biggest advantage of studying gene expression in identical twins discordant for Down’s Syndrome is that it helps to eliminate the confounding effects from natural genetic variations in affected and normal genomes.
If we were to study unrelated individuals, one group with Trisomy 21 and the other group normal individuals as control, the gene expression difference can be either due to natural genetic variations between the two groups or due to the effect of the extra Chr 21. Since both are confounded, we can see the effect of Down’s syndrome, only if the effect of extra Chr 21 is much greater than effect from natural genetic variations. However, the effect of natural genetic variation is strong that it has masked the Down’s syndrome’s effect.
It is worth repeating the advantage of the neat experimental design enabled by chance. By studying expression difference in identical twins discordant for Trisomy 21, we can be confident that the detected gene expression difference is due to the extra chromosome 21 (assuming there is no other systematic artifacts).
Domains of gene expression dysregulation
The team sequenced the transcriptome of fetal skin primary fibroblasts derived from both the trisomic and the normal twin, in four replicates for each. When compared the differential gene expression between the twins, the team found that only 182 genes were significantly expressed at 5% FDR threshold. 42 of the 182 are long non-coding RNAs (lncRNAs).
However, when they looked at the general gene expression pattern across chromosomes, the team found that differential expression pattern is not random across the genome. They saw a striking pattern of expression regulation across the genome. Basically, large regions of chromosomes were up regulated and down regulated alternatively across the genome. The paper calls these up/down regulated regions as Gene Expression Dysregulation Domains (GEDDs). Defining the domains’ boundaries, they found a total of 337 GEDDs of varying sizes ( 9 kb to 114 Mb) in the trisomy 21 discordant twins.
The observation of Gene Expression Dysregulation Domains are striking, but are they real? To rule out the observed pattern is real and to understand the meaning of the pattern, the team addressed a variety of questions.
- Are the expression dysregulation domains reproducible?
- Are the expression dysregulation domains present in other cell types?
- Are the expression dysregulation domains present in mouse model of Down’s syndrome?
- Can the expression dysregulation domains be identified trisomy 21 and normal samples, but from unrelated individuals?
- Does the expression dysregulation domains agree with previously known domain organization of the mammalian chromosomes?
- Does the observed expression dysregulation pattern correlate with methylation patterns obtained from reduced representation bisulphite sequencing?
- Does the observed expression dysregulation pattern correlate with modification of H3K4me3 obtained using ChIP-seq?
- Does the observed expression dysregulation pattern correlate with chromatin accessibility obtained from DNase I hypersensitivity assay?
Methods and Data
If you are interested in the sequencing data and analysis used in the paper, here are a few cursory details.
Paired-end RNA-seq sequence data were mapped using the GEM aligner and TopHat, separately . Differential expression analysis was performed using EdgeR. The log2 expression fold change (log2[FC]) between the trisomic and the euploid samples (after quantile normalization) were used for finding dysregulation domains. The lowess function in R was used to smooth the log2[FC] data (smoother span 3 to 30%) and identify upregulated and downregulated domains.
An upregulated domain was defined as a set of at least two consecutive genes with positive smoothed log2[FC] values. On the contrary, a downregulated domain was defined as a set of at least two consecutive genes with negative smoothed log2[FC] values.
A nagging question that refuses to go away is Can the observed up/down regulated domains be the result of an outliers/artifact from smoothing the fold change ? I guess that is easy to test by using only random set of genes and getting the domains and may be it is done in the paper. Haven’t had time to check.
All sequencing data associated with the paper are available at the Gene Expression Omnibus (GEO) data repository under accession number GSE55426.