This week’s Nature has a rather interesting paper showing a link between Father’s age and increase in de novo mutations and possibly disease association with autisim and schizophrenic. Kong et. al.’s study shows that on an average men can give rise to 2 de novo mutations per year. Increase in mutation rate is troublesome, as this has implications in increasing the chance of getting harmful/disease causing mutations.
Here is NextgenSeek’s summary/review of the paper trying to understand what they have done and what are the important results, and what are the questions to ponder. (Rate of de novo mutations and the importance of father’s age to disease risk by Kong et. al.)
De novo mutations or new mutations that is present in an individual, but not in his/her parents are interesting to geneticists for a long time. To understand the role of de novo mutations on disease, the team of researchers from deCODE Genetics in Iceland sequenced whole genomes of 78 trios (Father, Mother, and Child) and studied de novo mutations in these genomes. With a total of 219 distinct individual genome sequences, this study becomes the largest study to look at the de novo mutations on a genome scale. After sequencing whole genome sequences at 30X coverage, Kong and colleagues identified mutations in the child that were not present in either of his/her parent.
Over 63 de novo Mutations per trio
Given the noise in the current sequence technology, correctly identifying de novo mutations from the sequence data is a tricky task. Kong et al filtered a lot putative mutations to avoid false positives. They started with over 6200 putative mutations and ended up with about 4900 de novo mutations, after a series of stringent filtering. Since their sample includes people with different ages and even disease status for Autism and Schizophrenic, they further analyzed the relationship between these de novo mutations and age and disease status.
Number of de novo mutation Increases With Parent’s Age (Father’s?)
A quick look at the age of father at time of conception and the number of de novo mutations showed a strong linear relationship with increase in the number of mutations as the faster’s age increased. Their data showed that on average 20 year old father had about 40 mutations and 40 year old father had about 80 de novo mutations. Although this results show a strong relationship with age, one can not rule out the possibility of mother’s role in increase in de novo mutations. Also in their data, mother’s age is highly correlated with father’s age at the time conception, leaving the possibility of mother’s role as well.
More Paternal Mutations Than Maternal Mutations?
Luckily (or by design) five of the trios used in the study belong to three-generation family; Grand Parent-Father-Child. Using the sequence data from three generations, Kong et. al. could find the parent of origin of de novo mutation from parental haplotype. In each of five trios, there were significantly higher paternal de-novo mutations than the maternal mutations. On the average, there were 55 paternal mutations and 14 maternal mutations, suggesting that the de-novo mutations is mainly due to parental mutations. To validate the findings, Kong et. al chose 111 mutations randomly and sequenced using Sanger sequencing. Eleven failed at primer design and among the remaining 100 mutations 93 confirmed with the original findings.
A Few Things to Ponder
Interesting results, but a lot of questions spring to one’s mind, given the importance of results. Here is discussion on two of the concerns. A big concern is how reliable is the observed de novo mutations. Given the problems with next-gen sequencing technologies and the artifacts it can create (remember RNA-Editing?) the lurking fear is what if these mutations are not real.
What are the effect of sequencing artifacts?
One noticeable concern in the study is that it has used two different sequencers from Illumina, GAIIx and HiSeq and three different kits to sequence the genomes. It is very well-known that each sequencer/kit brings their artifacts to the table. In addition, each sequencers used here gave two different read lengths 100 and 120. It is not clear what the authors did to make sure these variations do not affect their findings. More details on the experimental design and QC would have been nice. Being a paper in Nature the descriptions are short and not that easy to understand. In addition, Kong et. al. have not given more details on the sequence data/coverage, how did they deal with repetitive regions and what they got in identifying these de novo mutations (other than mentioning that all are 30X coverage).
Paternal Mutations: Statistically Speaking
Another concern is the sample size used in this study to come to the conclusion that Father’s age increases the number of de novo mutations. Although Kong. et al. started with 78 trios, the strong “father’s age – more mutations” conclusion was mainly from the five trios with three-generation data. And why did not they do the haplotype inference on all 78 trio and say more about paternal origin of mutations in all of the samples.
As the authors explained the linear trend with Age of father and the number of mutations could also be due mother’s age as they both were correlated . What was little bit more worrying is the following observation in the paper
Mother’s age is substantially correlated with father’s age (r = 0.83) and, not surprisingly, is also associated with the number of de novo mutations (P = 1.9?×?10?12). However, when father’s age were entered jointly in a multiple regression, father’s age remained highly significant (P=3.3×10^-8), where as mother’s age did not (P=0.49)
Here is our issue with this (comments from a more statistically inclined is welcome here). When two variables are highly correlated (like the correlation of 0.83 between mother’s age and father’s age here), multiple regression has the problem of “multi-collinearity”, a common problem in statistics. If one uses correlated variables in the regression, it can behave weirdly and give invalid results about any individual variable. Basically one variable can drive the regression by explaining most of the variations. We may be missing something here, but thought a golden rule is if two variables are correlated, it does not make sense to use both and it is better to drop one of them in the model (If one uses lm in R, it automatically does it silently). However, the authors did not give more details on it, but went on to conclude that these mutations are probably due to paternal mutations.