George Church, Harvard based personal genomics and synthetic biology pioneer and his team devised a really cool scheme for storing anything digital on to DNA molecules. Church and his team demonstrated their technology by encoding a digital book with over 53,000 words into a 5.27 megabit bitstream first and then on to really small amount of DNA.
In theory, encoding digital data on DNA offers the highest data density for storage. The authors calculated that DNA can store 1 bit per 2.75 x 10-22 g, and 4.5 x 10^20 bytes per gram (455 exabytes per gram of ssDNA ). In comparison to a single layer Blue Ray disc which can hold 1.47*10^7 bits per mm^3, DNA can store 5.49*10^15 bits per mm^3.
This is not the first time researchers have tried to store things on to living things or DNA, but Church’s team demonstrated a better way to store digital data on DNA for really long-term and decode easily using next-gen sequencing technologies. Justifiably, the digital book that they chose to store is not any old book. But, G. Church’s yet to be published book titled “Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves” by Church GM and Regis E.
The results were published in this week’s issue of Science, Next-Generation Digital Information Storage in DNA. Learn more about how the book was encoded, its potential future applications, and how George Church went to bench, learn new techniques to do the experiments himself, from George Church and Sriram Kosuri’s interview below
The paper also gives a nice summary of what has been done before till now. Here is the list of few things that were encoded by previous efforts.
- 35 bit image, Microvenus, (35 bits) was encoded using Plasmid/E. coli for storage in 1988
- 718 bit text from Bible (Genesis) (http://www.ekac.org/geninfo.html) was encoded using Plasmid/E. coli for storage in 1998
- 138 bit text “JUNE 6 INVASION:NORMANDY” was encoded in a DNA Microdot in 1999
- 561 bit lines from Dickens was encoded in Plasmid/E.coli in 2001
- 120 bit “E=mc^2 1905!” on to E.coli genome in 2007
- 1688 bit Text/Music/Image on to Plasmid/E.coli in 2009
- 7920 bit Watermarking of synthetic genome on Mycoplasma genome in 2010