DNA Could Become the Next Big Data Warehouse
Jan 25, 2013 5:00 AM PT
Researchers at the European Bioinformatics Institute (EMBL-EBI) on Wednesday announced their success at storing data by encoding it to DNA. The system could stand the test of time -- tens of thousands of years, perhaps.
This method for archiving data could make it possible to store 100 million hours of high-definition video in about a cup of DNA, according to the scientists, and given the trend toward Big Data, that could be a big breakthrough. One gram of DNA could hold as much as information as more than a million CDs.
Unlike existing methods of data storage -- all of which have relatively limited life spans -- DNA has proven it can endure, literally, for ages. Like any physical carbon-based object, DNA can be destroyed, but it happens to be far more sturdy than paper or tape, and it can't easily be damaged by electromagnetic fields.
"We already know that DNA is a robust way to store information, because we can extract it from wooly mammoth bones -- which date back tens of thousands of years -- and make sense of it," said Nick Goldman of EMBL-EBI. "It's also incredibly small, dense, and does not need any power for storage, so shipping and keeping it is easy."
DNA could have an advantage over many current methods of storage.
Although tape is the cheapest storage medium, it's performance is lacking, explained Fang Zhang, storage analyst at IHS iSuppli. Analyzing Big Data using tape would take much longer, compared to SSD and HDD. Depending on how frequently it's used, tape could wear out.
Tomorrow, Tomorrow and Tomorrow
+ While it's highly unlikely that the words of William Shakespeare would ever be lost, 154 of the Bard's sonnets have been spelled out using DNA. An audio file containing part of Martin Luther King, Jr.'s 1963 "I Have a Dream" speech has also been encoded.
Being stored in DNA could allow those famous words to live on for eons.
"[This is] incredibly durable tagging for living things -- tagging that could transcend generations," said Rob Enderle, principal analyst at the Enderle Group. "The most obvious use would be to record rights into genetically created plants and animals to preserve rights and prevent illegal cloning/copies."
That isn't to say that there are no hurdles to clear. For one, scientists had to develop a code that used the four molecular letters -- also known as "bases" -- of genetic material. These consist of G, T, C and A -- a fairly limited alphabet. Then again, binary code consists of just 0 and 1, and it serves as the basis for most computer languages.
"At some future point, you might actually embed notes on how the plant or animal was created in the DNA," Enderle told TechNewsWorld. "In the case of a weaponized biological agent, you could also use this to better identify the source, should it be released, and you might be able to brand a benign virus and use it to model how a similar hostile virus would spread using a combination of DNA labeling and then population sampling to track the spread."
Knowing the Code
The key to ensuring that this data can be archived and also accessed is preserving knowledge of the code. There are numerous undeciphered writing systems that could hold long lost information. However, the EMBL-EBI researchers don't think this will be a problem.
They're worked to create a code that is error tolerant in molecular form. As long as someone knows the code, the data can be read back.
Despite concerns, DNA could be the storage method of the future, especially as Big Data begins to take us from the world of gigabytes to xeobytes.
"DNA as storage represents a new paradigm," said James Canton, Ph.D., of the Institute for Global Futures. "In a world generating xeobytes of data, we are facing a huge data tsunami. We're looking at ways to store, encrypt and secure all this data."
"DNA represents one paradigm for the future," Canton told TechNewsWorld. "DNA as a paradigm for storage represents an emerging platform towards quantum mechanics. That's when things will really change."