Need Extra Storage? Try DNA.
Engineers have successfully been pushing more storage into smaller spaces for decades, but that can't go on forever. The next big jump in data storage could take the form of the DNA inside all organic matter: Scientists in labs across the country are experimenting with synthetic DNA as a storage medium.
"If you look at where electronics is going, silicon technology, a lot of the basic technology that we use to build computers today, we're approaching the limit in almost all of them," says Luis Henrique Ceze, associate professor of computer science and engineering at the University of Washington. "DNA is very dense, it's very durable and it takes very little power to maintain, so there's a lot of advantage of using DNA for data storage."
Ceze has been working with Karin Strauss, a computer architecture researcher with Microsoft Research, on a collaboration between the two institutions -- a project that bridges computer science and biology. For a team of roughly 20 people, the university provides the molecular biologists, and Microsoft provides the computer scientists.
To understand how DNA could be used for storage, consider that all computer data is binary, or base-2. DNA is base-4, composed of adenine, cytosine, guanine and thymine (abbreviated as A, C, G and T). The first step is converting base-2 information to base-4, so A corresponds to 00, C to 01, G to 10 and T to 11 (that simplifies it a bit but gets across the idea).
Then scientists use a machine called a DNA synthesizer to combine the four chemicals in the right order. The result stores the information many times over as a salt-like cluster smaller than the tip of a pencil. Reading that information back requires a DNA sequencer.
While this may sound fragile -- like something that might blow away when a door opens suddenly -- DNA is the strongest data storage medium we've seen. Scientists have successfully read DNA that's hundreds of thousands of years old.
Sequencing DNA involves removing a tiny bit of the stored material, and the process depletes that sample. Consequently, a DNA recording can be read a finite number of times. That's not a problem, though, since the stored material has so much redundant data; it can be sampled over and over. Today's storage mediums also have a limited number of write and read cycles before they fail, so this is nothing new.
As Ceze points out, DNA will never be obsolete. While many of us have floppy discs in the back of a drawer that we can no longer read, that won't be the fate of DNA. "We're always going to care about DNA for life sciences and health reasons, so you're always going to have a way of reading information stored in DNA," Ceze says.
In January 2017, Microsoft and the University of Washington successfully encoded 200MB of data into DNA form, besting the previous record of 22MB. Using DNA, Strauss says, it will be possible to store 1 exabyte of data -- that's 1 billion GB -- in a 1-inch cube.
"We did an estimation of how much data you could put in a particular volume," Strauss says. "We tried to estimate what would be the volume if we today decided to archive the entire accessible internet, meaning everything that's not behind a password or any kind of electronic wall, and we came up with the size of a large shoebox."
That sounds like a far-off proposition, but Ceze believes we'll see commercial DNA storage systems on the market in a decade. They won't work exactly like microprocessor storage, since DNA requires a wet chemical environment for creation, but they'll provide massive capacity and random access at the same speeds that enterprise tape systems provide now.
A quickly advancing field
DNA has been around for billions of years, but demonstrations of DNA as a usable storage technology began in 1986 when MIT researcher Joe Davis encoded a simple binary image into 28 base pairs of DNA.
Another pioneer in this field is George Church, a genetics professor who's been working at Harvard Medical School since 1977 and running his own lab since 1986. Church has been interested in bringing down the cost of DNA reading and writing since the 1970s, believing that someday they would come together to create practical data storage. He became interested in working on DNA research around 2000 and performed critical sequencing and synthesis tests in 2003 and 2004. By 2012, he was able to put both areas together and create a system for encoding data. He wrote up that work in an influential 2012 article in Science.
"Prior to 2003 and '04, sequencing and synthesis were done essentially in capillaries -- or small tubes -- where you'd have one tube per sequence," Church explains. "It was pretty manual and not scalable. The lesson that we had learned from the microfabrication semiconductor industry was you needed to come up with a way to put them essentially in a two-dimensional plane and then scale down the feature size. Neither of those column-based methods were compatible with that, and so in 2003, we showed how you could distribute sequences on a two-dimensional plane and then image them with fluorescent imaging which is now the dominant way of sequencing. Then in 2004, we showed that you could manufacture DNA on a plane and then slip it off, and then it could be even more compact; so the plane was just a temporary place to synthesize them. Then you could compact them into a three-dimensional object that was millions of times more compact than normal data storage.
"Those were proof of concept exercises in 2003 and 2004. In 2012, we and others had refined both the reading and writing methods for DNA, and I put them together into one experiment where I encoded a book that I had just written into DNA, including images, showing that basically anything that's digital could be encoded with DNA."
Though cost is a significant hurdle for DNA storage, Church notes that the price has dropped steeply in the short time that research has been done. The cost of reading DNA has improved about 3 million-fold, while the cost of writing has improved by a billion-fold. He can see both improving by another million-fold in even less time. He also points out that the cost of copying DNA material is almost free, as is the cost of long-term storage. For archival storage, the cost of reading data isn't a big obstacle, since much archived material is never read, and some items are read selectively. Look at the costs of the whole system, he advises. Traditional storage methods move at Moore's Law speed and will plateau soon. But DNA storage technology is moving faster than Moore's law and shows no signs of plateauing.
Archival and cloud storage is where Church sees DNA data storage being adopted first. Companies including IBM, Microsoft and Technicolor have their own research and development teams studying the area, he notes. He collaborated with Technicolor in 2015 to store A Trip to the Moon, a classic 1902 film once believed lost, to DNA. Now Technicolor has many DNA copies which, combined, are no bigger than a speck of dust.
Church has a lab of 93 people working on DNA storage and currently focusing on two goals. The first is to radically improve the speed per cycle. Information is stored in hundreds of layers, each as thick as a molecule. Each addition currently takes three minutes, but Church believes that can be brought down to less than a millisecond. That's 200,000 times faster, he notes, and means a change from organic chemistry to biochemistry. He also wants to change how the instruments used for reading and writing are manufactured to make them much smaller. Currently, they're the size of large refrigerators. He wants that scaled down.
Built-in redundancy and the need for error correction
One researcher who was influenced by Church's 2012 Science article is professor Olgica Milenkovic of the University of Illinois, Urbana-Champaign. The article mentioned the need for coding, which immediately triggered her interest. Coding in storage research is a technique for adding redundancy to data, redundancy that can later be used to correct for errors that occur during the reading and writing process. For an example of why this is important, see the two Citizen Kane pictures here. Both were encoded in DNA by Milenkovic's team and then read. Guess which one used redundancy.
You're correct: The left-hand image was encoded with redundancy, and the right-hand image was not.
A simple way of adding redundancy is repeating each character a set number of times. Rather than writing a 0, write it four times. That's the brute-force approach -- simple but terribly inefficient. Milenkovic's work is about achieving the same error correction in a more sophisticated way. It involves techniques called parity checks or linear congruence checks to provide ways of verifying data.
"The whole field is basically about helping you correct errors if they appear or, even better, avoid errors that you know are very likely to appear," Milenkovic says. "We introduce controlled redundancy to get rid of errors, and that controlled redundancy is not in the form of simple repetition, because that's very ineffective."
That's what brought Milenkovic into the field, but her research now is about bringing down the massive cost of DNA synthesis.
"My student, H. Tabatabae Yazdi, who was very active on this topic, and I have been trying really hard to come up with a smart way to avoid synthesizing DNA. Synthesizing DNA is absolutely a bottleneck for this technology because of the high cost," Milenkovic says.
Though Milenkovic is leery of revealing too much about unpublished research, her solution involves "cunning mathematical approaches" and is all about timing, wherein the size of the interval between bits of information is meaningful.
"If you dispense with the formality that you want to use ATGCs to really encode binary symbols at a certain location, you can come up with much smarter and more efficient means of storing information, because you don't need to synthesize strands over and over again," Milenkovic explains. "You can synthesize them once in a certain way and then reuse that synthesized DNA in a smart combinatorial fashion."
Through her work, Milenkovic hopes to drive the cost of synthesizing DNA down at least three orders of magnitude. That's still not enough, she notes, but it's progress. It's also contributing to a line of research she finds fascinating.
"It's very exciting, to be honest, to play God and encode your own information in DNA," Milenkovic says. "It gives a person a feeling of excitement to know that you're playing with a chosen molecule of nature and making it do what you want to store and encode and convey information to the future."
Cashing in -- any day now
It's not all dry dusty academic research with DNA storage. Helixworks, a company based in Ireland, is trying to make money off it already. It has a product on Amazon -- sort of.
"We launched on Amazon so you could get 512KB of digital data encoded into DNA," explains Nimesh Pinnamaneni, the company's co-founder. "It's something very small. Maybe a picture or perhaps a poem, something like that."
It's an unusual purchase, but it could be the perfect love token for the person who has everything, especially if that person is a scientist:
"I remember one customer calling us. He wanted to gift his wife -- they both are biotechnologists -- he wanted to gift his wife on their wedding anniversary. He wanted to put a message in DNA and gift her a DNA," Pinnamaneni remembers. "She would have to sequence the DNA to read the message. It's a fairly complicated way to send a love message, but maybe it's cute for biotechnologists, you know?"
But Helixworks got a bit ahead of itself posting its product on Amazon in August of 2016, before it was ready to fulfill orders. Two people purchased the company's $199 DNADrive -- a 14-carat gold capsule with a cluster of DNA inside -- before Helixworks was forced to delist its product. DNADrive is still on Amazon, but it's not purchasable.
That doesn't mean Helixworks is over, just over-eager. It's come too far to stop now. The company started at the University of Borås in Sweden, where Pinnamaneni (pictured above, left) and Sachin Chalapati (right), the company's other co-founder, were getting master's degrees in biotechnology. They raised funds for DNA storage research, continued their work once back home in Bangalore, India, and developed a proof of concept.
Casting about for additional funds brought them to the IndieBio accelerator program run by SOSV, a startup venture capital firm in San Francisco, Calif. Helixworks was selected by the program and won $50,000 in cash and the ability to work from a lab in County Cork, where it's been for the past six months. The program includes mentoring on pitching a product, which Helixworks will put to use at this year's South by Southwest festival, where it will compete in a pitch event.
While churning out golden DNA capsules might eventually be a lucrative sideline, Pinnamaneni says his company's future is in the compact home and office DNA printers it's developing now. He wants to make DNA storage easy and affordable enough for anyone to use.
"We figured out you need to have something that works like a cartridge in a printer," Pinnamaneni explains. "You just have four colors, and these four colors can combine to form any color possible, right? That's how your ink printer works. We figured out we need to have something like that in our system. We designed a cartridge of 32 reagents that can be combined to form any DNA sequence possible."
While other labs are paying around $30,000 each time they need to have DNA synthesized, an operation that takes weeks to accomplish, Pinnamaneni says his invention can bring the cost and time down dramatically. Helixworks is working with Opentrons, a company that makes automated lab equipment, to create the printer. That's what it will pitch at SXSW.
"What we will be demonstrating on the expo floor is DNA writing right before your eyes," Pinnamaneni says.
The company won't be taking any orders yet. And that's good, because that romantic biotechnologist is still waiting for his anniversary gift.