Session Details: SMPTE 2019

Name

Data in DNA: The Arrival of DNA-Based Data Storage and Computing

Date & Time

Wednesday, October 23, 2019, 2:30 PM - 3:00 PM

Location Name

San Francisco Room

Speakers

Nick Gold - CATALOG

Description

DNA has been the storage medium used by nature to capture biological data for several billion years. The notion of storing digital information using synthetically-created DNA molecules has been around for decades. Over the past decade, a small number of research labs have worked to develop this technology, and have made some progress in capturing and retrieving greater quantities of information using this medium. However, stored data sets have been small, and chemically producing synthetic DNA molecules has remained slow and expensive. While the read side of the process, achieved through DNA sequencing techniques, has improved tremendously over the past few decades, writing has remained elusive. DNA as a commercial data storage technology has appeared to be far off. The promise, however, is huge, as DNA from a volumetric perspective is a million times more information-dense than today’s magnetic or flash media.

The Boston-based startup Catalog has achieved a significant leap in this technology, by taking a combinatorial, information theory-centric approach to creating synthetic DNA molecules that capture digital information. Instead of slow and expensive techniques to create custom DNA molecules a base pair at a time, we approach involves utilizing pre-fabricated, easily and inexpensively-replicable shorter DNA sequences that can be combined enzymatically to represent binary data objects. This has allowed us to leapfrog other efforts in terms of write speed by many orders of magnitude, bringing us significantly closer to being able to commercialize data storage systems using this medium. As a medium, DNA is extremely attractive not only because of its potential density – an exabyte of data in the volume of a sugar cube – but also because data sets can be reproduced quickly and inexpensively, and resulting samples can potentially remain viable for thousands of years. We are also developing approaches to perform direct processing of data stored in DNA, also using enzymatic and other biocomputing techniques. Because DNA data samples can contain many thousands of copies of a data set, there is the promise for certain types of parallelizable processing tasks to realize a major leap in performance using these techniques. This presentation will offer a primer into DNA data storage technology, describe our own unique approaches in encoding scheme, processes, and systems engineering that are leading to the fast-tracking of commercial applications around this technology, where the roadmap is headed, and what problems in digital media these developments will help to solve.

It will be a goal of this presentation to offer the audience some of the technical details of our technology portfolio. However, because this is such a new and innovative space, we will not assume the audience has a background in molecular biology, synthetic biology, biocomputing, or even possess deeply technical knowledge of traditional data storage and processing approaches (although some background in the traditional side of these technology areas will be useful).

The whole of the SMPTE 2019 Annual Technical Conference audience will likely have an interest in this presentation, as it is an inherently exciting subject with many useful applications in the world of media technology, and the information technology space in general. It should be appealing to executives as well as engineers who want to keep tabs on disruptive technology futures that will start to emerge as commercial options in the short to medium-term. CATALOG is already in discussions with studios and broadcasters, and interest in our technologies within these communities has been very high.

The audience of this presentation will receive a primer in DNA-based data storage and biocomputing technologies, and why this class of technology is so attractive in a variety of ways. They will learn how these technologies are rapidly progressing toward commercial applications, which over time go well beyond data archive applications. They will learn about the unique and innovative approach CATALOG has taken in encoding digital information in synthetic DNA molecules, which emphasizes a combinatorial and information theory-centric approach. They will be briefed on the state of the art of CATALOG’s technology portfolio, including our Terabit-per-day DNA data writer. They will learn about some of our pilot project activities utilizing this technology, and how an overall solution including data storage, retrieval, and data validation for retrieved data has been engineered. The audience will learn about our ongoing technology development efforts, including in situ computation on data stored in DNA, and the types of computing challenges this truly unique approach may help to solve, problems that are less ideal to attack using traditional silicon-based processor technology.