Yet another genomics/genetics paper published first in the open preprint server arXiv got published in Nature. This time the Princeton team led by Leonid Kruglyak lab’s members published their work titled “Finding the sources of missing heritability in a yeast cross, Bloom et. al.” in Nature this week. An earlier version of the manuscript was made available at the arXiv months before the publication. In the paper, Bloom et. al. use a large yeast cross to investigate whether the phenotypic variation can be explained by just genotype alone.
Here is our write up of the nice paper that uses a variety of high-throughput genomics, assays, sophisticated statistical methods to address the interesting questions.
The whole of biology is about understanding the relationship between phenotype/trait and genotype. For some traits there is just one genotype that is responsible for the variation in the trait. Those are simple Mendelian traits. However, for a lot of other traits, like human height or a complex human disease, numerous genotype/genomic regions play a role in the variation that we see.
Thanks to advances in genotyping technologies, for more than a decade scientists are identifying genetic variations/genotypes associated with a complex phenotype using Genome Wide Association Studies. A common theme that emerged pretty soon was that many genotypes are associated with a trait and the contribution of each genotype is very small.
So the question that naturally comes out is what about the remaining large unexplained variation in a complex disease? Missing heritability is basically this huge hole in our ability to explain the variation that we see in a complex trait.
A number of possible factors could help us explain the unexplained variation in a complex trait. The factors include sample size used in the study, indels, structural variations, rare variants, gene-gene interaction, and gene-environment interactions. Although we know there many possibilities, we don’t understand what really contributes to the missing heritability.
A Different Kind of 1000 Genomes Project
Bloom et. al, possibly for the first time, looked to identify the factors responsible for the missing heritability. They used the model organism yeast to find the answers. Bloom et. al. crossed two yeast strains; BY – the standard lab strain and RM – a wild strain from vineyard to get 1008 haploid segregants. Yeast can live in the haploid state, with just one copy of each chromosome. Interestingly, the haploid genome is sort of like F1 cross genetically. Yeast offers a number of great advantages and using yeast crosses Bloom et. al. could
- create genetic variations by crossing two strains (since each genetic loci is from one of the two parent strains there is no rare variants)
- create a large sample size for the experiment and thus increase the power to find the sources of missing heritability
- grow them all in one condition and thus remove the gene-environment interactions
- measure phenotypes using high-throughput assay
- be worry-free on dominance effects as it is a haploid
and possibly many more. At the end of it, you have a beautiful system where one can ask what are the contributions of additive and interactive effects of genetic variations on a set of phenotypes. In such a system, phenotypic variation can be partitioned in to two major factors; one is completely due to genetic factors and the other due to random/experimental errors. This overall contribution of heritable genetic factors is called Broad Sense Heritability. And one can further partition the broad sense heritability in to additive genetic factors and gene-gene interaction factors.
The contribution of additive genetic factors is called narrow sense heritability. In other words, narrow sense heritability is the proportion of variations explained by additive genetic factors, while broad sense heritability is the proportion of variation explained by all of genetic effects. The difference between the two heritability measure gives an estimate on the effect of interactions on a phenotype. If the difference is zero, then it means that there is no interaction and all phenotypic variations can be explained by additive components. A non-zero difference between the two heritability measures suggest the role and degree of interaction effects on the phenotype.
Bloom et. al. sequenced the parental strains to a great depth so as to use them in identifying the SNPs between the strains. They sequenced the DNA from the 1008 segregants derived from the parent strains. Since the parents were also sequenced, Bloom et. al. could identify over 30,000 SNPs in the 1008 segregants. Yeast genome is about 12MB in size, so it turns out to 1 SNP in every 400 bases.
Basically, they could mark each of those ~30,000 loci in segregants, whether they came from BY or RM (which parent). For example, if there is a SNP at genomic location “i”, they could tell whether the allele at locus “i” came from BY or RM for all 1008 segregants. For these 1008 segregants, Bloom et. al. also measured 46 phenotypes by growing them on different growth conditions.
Bloom et. al. first measured broad sense heritability and narrow sense heritability using the phenotype data in conjunction with genetic relatedness of these yeast segregants. And found that the difference in heritability estimates varied from 0.02 to 0.54, suggesting that some phenotypic variation is mainly due to additive effects and the others have huge contributions from gene-gene interactions.
Another way to look at the heritability is to use the genotype and the phenotype data together. With these genotypes and phenotypes, they did Quantitative Trait Loci (QTL) analysis to find the genomic regions that are associated with each phenotype.
In the simplest case, QTL analysis is just a way to look at the correlation (or association) of genotype and phenotype. For example, for a pair of phenotypes and genotypes from 1008 segregants one can ask if they are correlated/associated by simply plotting genotype vs phenotype as shown in the figure.
In addition to the simple QTLs for all phenotype and genotype combinations, Bloom et al. also used a bit more complex approach to identify all loci contributing to a phenotype. This resulted in a more complex model for each phenotype; with a minimum 5 genetic loci to a maximum of 29 genetic loci contributing to the phenotype.
Bloom et. al. also estimated the proportion of phenotypic variation explained by the additive components. The large sample size enabled them to detect most of additive heritability. The proportion variance explained by additive components was about 20% when the sample size is 100 and it is about 90% when the sample size is 1000. This kind of suggest that for some phenotypes the missing heritability is not really missing, but the small sample size does not have the power to detect the effect of many loci with small effect size.
Finally, Bloom et. al. looked at the difference in heritability estimates using QTL models; one model with additive genetic terms and the other with additive and interactive genetic effects. ti under stand gene-gene interactions or epistasis. To make the gene-gene interaction analysis computationally feasible and improve the power to detect interactions, Bloom et. al focused only on a subset of 30,000 markers (about 4400 SNPs) and looked only for pairwise interactions. They found that 17 of the 46 traits had gene-gene interaction, with a total of 23 interacting locus pairs. Interestingly, they also looked for interaction effects without the additive genetic components and found more gene-gene interactions. Although the prevalence of interactions are clear from these analyses, we are still handicapped to understand fully due to number of reasons like the sample size and computations/statistical feasibilities.
Next: The 1000 Transcriptomes Project?
So what is next for finding the missing heritability in yeast? The authors looked at only 46 phenotypes for these 1008 segregants (can’t believe I just said that). Being the leading group on using gene expression profiles for understand complex traits, one can possibly expect about ~6000 more phenotypes for these segregants and the molecular network details on the so called missing heritability.
You may also want to read the summary/thoughts on “Finding the sources of missing heritability” from
- Haldane’s Sieve preprint blog
- Where’s the heritability? Right where you’d expect—if you look close enough