NIH and Amazon announced that the biggest genomic data from the 1000 genomes project is publicly available from Amazon Cloud services (AWS). The 1000 Genomes Project was announced in January 2008, is an international collaborative effort to create a new map of the genetic variations in human genome across multiple human populations in the world. The 1000 Genomes Projects will ultimately have the genomes of more than 2,600 people from 26 populations around the world.
The large collaborative effort has resulted in 200 terabytes of data. To get a feel of ho massive the data is, 200 terabytes is equivalent to 16 million file cabinets filled with text and needs more than 30,000 standard DVDs. Currently, DNA sequence data of 1700 people is available on the cloud and the remaining sequencing of 900 people will be finished by the end of 2012 and that data will be made available after the completion of sequencing. Learn more about the 100 Genomes Project Data on Amazon Cloud from http://aws.amazon.com/1000genomes/.
In the press release announcing the availability of 1000 Genomes project data on Amazon cloud, NIH Director Francis S. Collins, said
The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation’s health and economy.
The effort to make the largest human genomic data easy to access is part of The Big Data initiative with the goal to develop technologies and resources to manage and analyze large scale data sets.
NHGRI director Eric D. Green said that
Improving access to data from this important project will accelerate the ability of researchers to understand human genetic variation and its contribution to health and disease,
Richard Durbin, Ph.D., co-director of the 1000 Genomes Project and joint head of human genetics at the Wellcome Trust Sanger Institute, Hinxton, England said that
Putting the data in the cloud provides a tremendous opportunity for researchers around the world who want to study large-scale human genetic variation but lack the computer capability to do so.
The 1000 Genomes Project data is also available from