Using DNA to store data? 1 gram can contain 215 million gigabytes: the study

As some studies conducted by some researchers of the Columbia University in 2017, in principle 1 single gram of DNA would be able to contain up to 215 petabytes of information, or 215 million gigabytes of information: an impressive capacity when compared to modern hard disks! Not only that, DNA, with its extraordinary density and resistance over time, could represent a more long-lasting alternative to current memory media, which suffer from a rather short “life”. It should be noted, however, that data storage via DNA is a technology still in an experimental phase, which suffers from high costs and limited writing and reading speeds.

Research into how much data DNA can contain

The storage of digital data in DNA is based on its molecular structure of this nucleic acid, which is composed of four nitrogenous bases – adenine (TO), thymine (T), cytosine (C) And guanine (G) – which act as coding elementssimilar to 0 And 1 of the binary computer language. Each nucleotide can theoretically represent up to 1.8 bits of information Yaniv Erlich And Dina Zielinskitwo researchers from the Columbia UniversityIn the 2017 they managed to get 85% closer to this limit by coding 1.6 bits per nucleotide. A result never achieved before that moment!

All this, translated into simple terms, is equivalent to saying that the total amount of data produced globally (in 2018 estimated at 3.52 · 1022 bit and which is expected to increase a hundredfold by 2040), could theoretically be stored without resorting to mechanical hard disks, solid state disks (SSD) and other “classic” storage media. To use the terms used by the magazine Science the system could, at least theoretically, «storing every bit of data ever recorded by humans in a container the size and weight of a pair of pickup trucks».

To achieve this result, Erlich and Zielinski studied the algorithms already used in the past by other researchers to encode and decode data. In this regard, the magazine Science tells:

They started with six files, including a complete operating system, a computer virus, an 1895 French film titled Arrival of a Train at La Ciotat and a 1948 study by information theorist Claude Shannon. They have before Converted files to binary strings of 1s and 0sthey have them compressed into a master file and then they have split the data into short strings of binary code. They devised an algorithm called “DNA fountain,” which randomly packed the strings into so-called droplets, to which they added extra tags to help them reassemble them in the correct order later. In all, the researchers generated a digital list of 72,000 DNA strands, each 200 bases long.

Afterwards, the researchers sent the data files to the startup Twist Biosciencewhich was responsible for synthesizing DNA strands. A couple of weeks later, Erlich and Zielinski received in the mail a vial containing the grain of DNA that had been used to encode their file and, using modern DNA sequencing technology, they proceeded to decode the information. The sequences were then inserted into a computer, which translated the genetic code into binary code using the tags to reassemble the six original files. As reported above Sciencethe approach used by researchers «it worked so well that the new files contained no errors».

If you want to delve deeper into the technical aspects of the experiment, we leave you with the following video (in English), in which the two researchers illustrate everything.

DNA: will we use it as a hard disk one day?

At this point it is natural to ask: given the results achieved in the experiment, will we one day use DNA as hard drives? It’s still too early to tell. This is because although it is an effective method for storing large amounts of data in a small space, it is an extremely expensive practice, which could discourage the adoption of similar technologies on a large scale. Suffice it to say that for 2 megabytes of data the researchers paid 7,000 dollars to synthesize them and another 2,000 dollars to read them, thus incurring an overall expense of $9,000. Not to mention that, compared to other methods of data storage, writing and reading DNA takes relatively longer. This could suggest scenarios in which DNA will not be used as a “hard disk” by the common user, but by large companies that process enormous quantities of data, primarily big tech.