APR 07, 2016 9:00 PM PDT

600 smartphones of data can fit in this much DNA

All the baby pictures, financial transactions, funny cat videos, and email messages that we hoard often require technology companies to build sprawling data centers.

A new technique could shrink the space needed to store digital data that today would fill a Walmart supercenter down to the size of a sugar cube.

The team of computer scientists and electrical engineers has detailed one of the first complete systems to encode, store, and retrieve digital data using DNA molecules, which can store information millions of times more compactly than current archival technologies.
 
All the movies, images, emails, and other digital data from more than 600 basic smartphones (10,000 gigabytes) can be stored in the faint pink smear of DNA at the end of this test tube.

In one experiment outlined in a paper presented in April at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems, the team successfully encoded digital data from four image files into the nucleotide sequences of synthetic DNA snippets.

More significantly, they were also able to reverse that process—retrieving the correct sequences from a larger pool of DNA and reconstructing the images without losing a single byte of information.

The team has also encoded and retrieved data that authenticates archival video files from the University of Wsahington’s Voices from the Rwanda Tribunal project that contain interviews with judges, lawyers, and other personnel from the Rwandan war crime tribunal.

“Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works—it’s very, very compact and very durable,” says coauthor Luis Ceze, associate professor of computer science and engineering at the University of Washington.

“We’re essentially repurposing it to store digital data—pictures, videos, documents—in a manageable way for hundreds or thousands of years.”

The digital universe—all the data contained in our computer files, historic archives, movies, photo collections, and the exploding volume of digital information collected by businesses and devices worldwide—is expected to hit 44 trillion gigabytes by 2020.

That’s a tenfold increase compared to 2013, and will represent enough data to fill more than six stacks of computer tablets stretching to the moon. While not all of that information needs to be saved, the world is producing data faster than the capacity to store it.
 

DNA archives


DNA molecules can store information many millions of times more densely than existing technologies for digital storage—flash drives, hard drives, magnetic and optical media. Those systems also degrade after a few years or decades, while DNA can reliably preserve information for centuries. DNA is best suited for archival applications, rather than instances where files need to be accessed immediately.

The team from the university’s Molecular Information Systems Lab in close collaboration with Microsoft Research, is developing a DNA-based storage system that it expects could address the world’s needs for archival storage.

First, the researchers developed a novel approach to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences—adenine, guanine, cytosine, and thymine.

“How you go from ones and zeroes to As, Gs, Cs, and Ts really matters because if you use a smart approach, you can make it very dense and you don’t get a lot of errors,” says coauthor Georg Seelig, an associate professor of electrical engineering and of computer science and engineering. “If you do it wrong, you get a lot of mistakes.”

The digital data is chopped into pieces and stored by synthesizing a massive number of tiny DNA molecules, which can be dehydrated or otherwise preserved for long-term storage.
 

Random access


The University of Washington and Microsoft researchers are one of two teams nationwide that have also demonstrated the ability to perform “random access”—to identify and retrieve the correct sequences from this large pool of random DNA molecules, which is a task similar to reassembling one chapter of a story from a library of torn books.

To access the stored data later, the researchers also encode the equivalent of zip codes and street addresses into the DNA sequences. Using Polymerase Chain Reaction (PCR) techniques—commonly used in molecular biology—helps them more easily identify the zip codes they are looking for. Using DNA sequencing techniques, the researchers can then “read” the data and convert them back to a video, image, or document file by using the street addresses to reorder the data.

Currently, the largest barrier to viable DNA storage is the cost and efficiency with which DNA can be synthesized (or manufactured) and sequenced (or read) on a large scale. But researchers say there’s no technical barrier to achieving those gains if the right incentives are in place.

Advances in DNA storage rely on techniques pioneered by the biotechnology industry, but also incorporate new expertise. The team’s encoding approach, for instance, borrows from error correction schemes commonly used in computer memory—which hadn’t been applied to DNA.

“This is an example where we’re borrowing something from nature—DNA—to store information. But we’re using something we know from computers—how to correct memory errors—and applying that back to nature,” says Ceze.

Microsoft Research, the National Science Foundation, and the David Notkin Endowed Graduate Fellowship funded the work.

Source: University of Washington

This article was originally published on futurity.org.
About the Author
MS
Futurity features the latest discoveries by scientists at top research universities in the US, UK, Canada, Europe, Asia, and Australia. The nonprofit site, which launched in 2009, is supported solely by its university partners (listed below) in an effort to share research news directly with the public.
You May Also Like
AUG 18, 2022
Cardiology
Necessary Sleep Duration Linked to Genetics
AUG 18, 2022
Necessary Sleep Duration Linked to Genetics
Some people only need 4-6 hours of sleep to be well rested, and these people may also be more resistant to Alzheimer's d ...
AUG 29, 2022
Health & Medicine
US-Funded Research to be Open Access by 2026 Under New Federal Guidance
AUG 29, 2022
US-Funded Research to be Open Access by 2026 Under New Federal Guidance
If you have ever done research for school or work, or simply been curious about a scientific finding and wanted to read ...
NOV 04, 2022
Cell & Molecular Biology
Copy Number Variation - An Important Aspect of Human Genetics
NOV 04, 2022
Copy Number Variation - An Important Aspect of Human Genetics
We learn that we inherit two copies of every gene, one from each of our parents, but the story is a bit more complex.
NOV 06, 2022
Genetics & Genomics
The Lone Participant in a CRISPR Therapy Trial has Died
NOV 06, 2022
The Lone Participant in a CRISPR Therapy Trial has Died
In August of this year, a single patient was enrolled in a trial that used CRISPR to correct a genetic mutation that led ...
NOV 10, 2022
Clinical & Molecular DX
Rare Genetic Disorder Successfully Treated for the First Time After a Groundbreaking In Utero Treatment
NOV 10, 2022
Rare Genetic Disorder Successfully Treated for the First Time After a Groundbreaking In Utero Treatment
Infantile-onset Pompe disease is a rare genetic disorder. Those with infantile-onset Pompe disease begin showing symptom ...
NOV 15, 2022
Genetics & Genomics
Revealing the Mutations that Make Melanoma Immortal
NOV 15, 2022
Revealing the Mutations that Make Melanoma Immortal
Telomeres cap the ends of chromosomes, preventing breakage. Some cancer cells can use those protective caps to their adv ...
Loading Comments...