The history of deoxynucleic acid (DNA) sequencing and DNA synthesis has been intertwined, with advances in one often leading to advances in or applications of the other.
The double helix structure of DNA was discovered by Watson and Crick in 1953.
In the decades following that, chemists worked to develop methods to synthesize DNA strands (oligonucleotides) of predefined sequence. Caruthers, et al (U.S. Pat. No. 4,458,066 “Process for preparing oligonucleotides”, filed Mar. 24, 1981) introduced the phosphoramidite chemistry now widely used. It was implemented on substrates similar to chromatography columns, yielding one oligonucleotide per synthesis. At the end of this process, the synthesized molecules are cleaved from the substrates on which they have been synthesized, so they can be used in further reactions in solution.
Instrument manufacturers subsequently introduced equipment implementing this process on multiple columns in parallel. On Apr. 24, 2000 for example, PE Applied Biosystems issued a press release introducing its “ABI 3900 High Throughput DNA Synthesizer” with 48 columns operating concurrently. In a system of this type, each oligo was synthesized on a separate substrate and delivered in a separate tube (or other container). Relatively large amounts of each DNA sequence can be synthesized on these machines (the ABI 3900 specification was 40 nanomoles up to 1 micro-mole per sequence).
Methods for the synthesis of DNA sequences led to Polymerase Chain Reaction (PCR), which uses synthesized DNA priming sequences. Kary Mullis, who invented PCR and was later awarded the Nobel Prize for it, was working in a DNA synthesis lab at Cetus at the time. It was originally devised as a method to enable sequencing of the sickle cell anemia locus via Sanger sequencing. U.S. Pat. No. 4,683,202 “Process for amplifying nucleic acid sequences”, the original PCR patent, was filed in 1985.
This was further refined in methods which integrated DNA amplification and the Sanger chain terminating reaction, e.g., Murray, V., “Improved double-stranded DNA sequencing using the linear polymerase chain reaction” Nucleic Acids Research, Vol 17, No 21 Pg 8889, Nov. 11, 1989. Still further refinement along these lines was termed “Cycle Sequencing” (e.g., U.S. Pat. No. 5,432,065 filed Mar. 30, 1993). All of these combined the use of individually synthesized DNA sequences, as primers for further DNA synthesis with polymerase enzymes.
During this time, other groups developed methods for synthesis of DNA on a highly parallel microscopic scale, on a single substrate. This increased the parallelism of DNA synthesis by over a thousand-fold. Compared to the ABI 3900 instrument mentioned above for example, which can synthesize up to 48 sequences in parallel, some array-based methods can synthesize over 50,000 sequences in parallel without large manufacturing set-up costs.
One method of array-based synthesis was described in Pirrung, et al (U.S. Pat. No. 5,143,854 “Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof”, priority date Jun. 7, 1989). It was developed by scientists at Affymax Corporation, later spun out as Affymetrix, Inc. This early work used fixed photolithographic masks, similar to those of the semiconductor industry. This enabled production of many “DNA arrays” with the same set of DNA sequences on them.
A group at the University of Wisconsin at Madison later devised a more flexible version of this using micro-mirror arrays (rather the fixed photolithographic masks) to dynamically define the spatial pattern of light in the system. This was spun out into the company Nimblegen in 1999, which was acquired by Roche in 2007.
Another method for synthesis of DNA on a highly parallel microscopic scale, on a single substrate, was developed using technology from ink-jet printing. Brennan (U.S. Pat. No. 5,472,672 “Apparatus and method for polymer synthesis using arrays” filed Oct. 22, 1993) described such a system including the dispensing of microscopic droplets of synthesis reagents through an array of nozzles on a moveable print head. This technology was commercialized by Agilent, Inc.
Early applications of these DNA arrays involved use of the oligonucleotides on the array substrates where they were synthesized. This typically involved hybridization of DNA (or complementary deoxyribonucleic acid (cDNA)) from a test sample to the oligonucleotides on the array. If the DNA (or cDNA) of the test sample was fluorescently labeled in advance, then imaging the array after hybridization and washing can quantify the amount of each sequence in the test sample. This was initially used to measure mRNA expression of genes and it was later used for genotyping.
Application of DNA array technology to DNA sequencing largely waited until DNA sequencing itself advanced. The original methods of DNA sequencing (Sanger, Maxim & Gilbert shared a 1975 Nobel prize) used electrophoresis for separation and subsequent readout. Each such electrophoretic separation and detection was spatially separate, though companies developed instruments with several in parallel (e.g., Applied Biosystems Model 370, introduced about 1987, supported up to 24 in parallel; Applied Biosystems Model 3700, introduced in 1999 supported up to 96 in parallel, and Amersham's Molecular Dynamics unit introduced a version of its MegaBace system about 2002 with 384 in parallel.)
Several groups did attempt to leverage DNA arrays for DNA sequencing (e.g., Lysov, et al, 1996, “Efficiency of sequencing by hybridization on oligonucleotide matrix supplemented by measurement of the distance between DNA segments.”). Affymetrix commercialized this approach for small applications (variants in CYP drug metabolizing genes, genotyping of HIV). These methods conduct the DNA sequencing reactions and fluorescent readout on the array and thus have been limited to one base per array spot and fairly small non-repetitive portions of genomes. Heidi Rehm, et al at the Harvard Medical School published a set of protocols for this in April 2011 “Targeted Sequencing Using Affymetrix CustomSeq Arrays” in Current Protocols in Human Genetics. In it the technology was described as suitable for re-sequencing portions of the human genome up to 300,000 bases in total length.
The field moved forward with the commercialization of “Next Generation DNA Sequencing” methods, which enabled measurement of hundreds of thousands of sequences at a time. One of the first such systems was commercialized by 454, Inc (previously a division of Curagen, Inc and later acquired by Roche) in 2005 (Margulies, M. et al. “Genome sequencing in microfabricated high-density picoliter reactors” Nature 437, 376-380 (2005). This initial system can measure up to 200,000 sequences in parallel, each on average 100 bases long.
Two years later, in 2007, a group at the Baylor College of Medicine used a 454 DNA sequencing instrument to sequence an exome (Albert, et al “Direct selection of human genomic loci by microarray hybridization” Nature Methods, November 2007, 4(11):903-5). The key to this work was that a DNA array was used not as a substrate for sequencing itself, but to enrich a genomic DNA sample for just the parts of the genome intended for sequencing. The original DNA sample, fragmented, was hybridized to the array. Portions of the genome which did not hybridize were washed off. Then the portions of the genome which did hybridize to the array were eluted off the array and sequenced separate from the array, using the 454 system. The DNA arrays used were from Nimblegen. Although that DNA synthesis technology had been available since 1999, it was its 2007 combination with huge parallelism of next generation DNA sequencing that made this application practical.
In the work described above, DNA sequences synthesized on an array were used in-place on the array substrate. During the early 2000's though, groups began to explore technologies by which DNA molecules can be synthesized on an array but attached to the substrate of the array by a cleavable linker. This meant that after array synthesis, the linkers can be cleaved (e.g., chemically) releasing the oligonucleotides into solution, where they can be used as a pool. One example of this work is U.S. Pat. No. 7,211,654 (Xiaolian, et al, “Linkers and co-coupling agents for optimization of oligonucleotide synthesis and purification on solid supports” May 1, 2007).
In 2007, a group at the Broad Institute, began to explore use of this approach to create pools of oligonucleotides in solution to capture select portions of the genome of a test sample. (See U.S. provisional application 61/063,489, Gnirke, et al, filed Feb. 4, 2008: “Selection of nucleic acids by solution hybridization to oligonucleotide baits”.) Dr. Carsten Russ of the Broad Institute described this approach at the February 2008 AGBT conference (reported by GenomeWeb). During 2008, Agilent licensed this technology. It was published on line Feb. 1, 2009 “Solution hybrid selection with ultra-long oligonucleotides for massively parallel sequencing” Nature Biotechnology 27, 182-189 (2009). In February 2009 Agilent launched this as a product line (trade name “SureSelect”) with its first human exome kit (“SureSelect All Exon”).
Dr. Gnirke, et al at the Broad Institute continued to innovate and applied targeted capture, using array synthesis of DNA, to RNA transcriptomes: “Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts” Joshua Levin, et al (including Andreas Gnirke). Genome Biology 2009, 10:R115.
In parallel with this, Next Generation DNA Sequencing technologies continued to advance. In June 2006, Solexa, Inc first shipped its Genome Analyzer system. This system measured 40 million DNA sequences in parallel, each initially 25 bases long. In 2008 Illumina, Inc acquired Solexa. Subsequent versions of this technology have continued to advance. The most current instrument (Illumina HiSeq-4000) can produce about 6 billion sequences in parallel, each 2×125 bases, for a total of 1.5 trillion bases, in a single run.
Exome sequencing has been broadly adopted as a research tool. As an example, the Exome Aggregation Consortium based at the Broad Institute has released a dataset based on human exome sequences from over 60,000 individuals (release v0.3 Jan. 2015).
Exome sequencing has also been adopted clinically. The first commercial clinical exome tests were announced by GeneDx and Ambry Genetics at the ASHG conference in October 2011. Others including the Baylor College of Medicine have also offered commercial clinical human exome-based tests, and over 8,000 have been performed.
DNA synthesis technologies have continued to advance, particularly focused on gene synthesis applications requiring very long DNA sequences. Many of these advances involve the construction of long DNA molecules by strategies which combine shorter synthetic DNA molecules. This was reviewed in: “Large-scale de novo DNA synthesis: technologies and applications” Sriram Kosuri and George Church, Nature Methods, Volume 11, No 5, May 2014; 499.