The complexity and power of biological reactions has increased dramatically over the last thirty years. The initial observations of the “hybridization” process, i.e., the ability of two polymers of nucleic acid containing complementary sequences to find each other and anneal through base pairing interaction, by Marmur and Lane, Proc. Nat. Acad. Sci., U.S.A. 46, 453 (1960) and Doty et al., Proc. Nat. Acad. Sci., U.S.A. 46, 461 (1960), have been followed by the refinement of this process into an essential tool of modern biology.
Initial hybridization studies, such as those performed by Hayashi et al., Proc. Nat. Acad. Sci., USA. 50, 664 (1963), were formed in solution. Further development led to the immobilization of the target DNA or RNA on solid supports. With the discovery of specific restriction endonucleases by Smith and Wilcox, J. Mol. Biol. 51, 379 (1970), it became possible to isolate discrete fragments of DNA. Utilization of immobilization techniques, such as those described by Southern, J. Mol. Biol. 98, 503 (1975), in combination with restriction enzymes, has allowed for the identification by hybridization of singly copy genes among a mass of fractionated, genomic DNA.
In 1977, two methods for DNA sequencing were reported. These were the chemical degradation method of Maxam and Gilbert, Proc. Nat. Acad. Sci. USA. 74:560 (1977) and the enzymatic method of Sanger et al., Proc. Nat. Acad. Sci. USA. 74:5463 (1977). Both methods generate populations of radiolabeled oligonucledtides which begin at a fixed point and terminate randomly at a fixed residue or type of residue. These populations are resolved on polyacrylamide gels which allow the discrimination between oligonucleotides that differ in length by as little as one nucleotide.
The Maxam and Gilbert method utilizes a fragment of DNA radiolabeled at one end which is partially cleaved in five separate chemical reactions, each of which is specific for a particular base or type of base. The products of these chemical reactions are five populations of labelled molecules that extend from the labeled end to the site of chemical cleavage. This method has remained relatively unchanged since its initial development. This method works best for DNA sequences that lie less than 250 nucleotides from the labeled end.
In contrast, the Sanger method is capable of sequencing greater than 500 nucleotides in a single set of reactions. The Sanger method is an enzymatic reaction that utilizes chain-terminating dideoxynucleotides (ddNTPs). ddNTPs are chain-terminating because they lack a 3′-hydroxyl residue which prevents formation of a phosphodiester bond with the succeeding deoxyribonucleotide (dNTP). A small amount of one ddNTP is included with the four conventional dNTPs in a polymerization reaction. Polymerization or DNA synthesis is catalyzed by a DNA polymerase. There is competition between extension of the chain by incorporation of the conventional dNTPs and termination of the chain by incorporation of a ddNTP. A short oligonucleotide or primer is annealed to a template containing the DNA to be sequenced. The original protocols required single-stranded DNA templates. The use of double-stranded templates was reported later. The primer provides a 3′ hydroxyl group which allows the polymerization of a chain of DNA when a polymerase enzyme and dNTPs are provided.
The original version of the Sanger method utilized the Klenow fragment of E. coli DNA polymerase. This enzyme has the polymerization and 3′ to 5′ exonuclease activity of the unmodified polymerase but lacks 5′ to 3′ exonuclease activity. The Klenow fragment has several limitations when used for enzymatic sequencing. One limitations is the low processivity of the enzyme, which generates a high background of fragments that terminate by the random dissociation of the enzyme from the template rather than by the desired termination due to incorporation of a ddNTP. The low processivity also means that the enzyme cannot be used to sequence nucleotides that appear more than ˜250 nucleotides from the 5′ end of the primer. A second limitation is that Klenow cannot efficiently utilize templates which have homopolymer tracts or regions of high secondary structure. The problems caused by secondary structure in the template can be minimized by running the polymerization reaction at 55° C.
Improvements to the original Sanger method include the use of polymerases other than the Klenow fragment. Reverse transcriptase has been used to sequence templates that have homopolymeric tracts. Reverse transcriptase is somewhat better than the Klenow enzyme at utilizing templates containing homopolymer tracts.
The use of a modified T7 DNA polymerase (Sequenase™) was the most significant improvement to the Sanger method. See Sambrook, J. et al. Molecular Cloning, A Laboratory Manual, 2d Ed. Cold Spring Harbor Laboratory Press, New York, 13.7-13.9 and Hunkapiller, M. W. (1991) Curr. Op. Gen. Devi. 1:88-92. Sequenase™ is a chemically-modified T7 DNA polymerase has reduced 3′ to 5′ exonuclease activity. Tabor et al., Proc. Natl. Acad. Sci. U.S.A. 84:4767 (1987). Sequenase™ version 2.0 is a genetically engineered form of the T7 polymerase which completely lacks 3′ to 5′ exonticlease activity. Sequenase™ has a very high processivity and high rate of polymerization. It can efficiently incorporate nucleotide analogs such as dITP and 7-deaza-dGTP which are used to resolve regions of compression in sequencing gels. In regions of DNA containing a high G+C content, Hoogsteen bond formation can occur which leads to compressions in the DNA. These compressions result in aberrant migration patterns of oligonucleotide strands on sequencing gels. Because these base analogs pair weakly with conventional nucleotides, intrastrand secondary structures are alleviated. In contrast, Klenow does not incorporate these analogs as efficiently. The main limitation to the amount of DNA sequence that can be obtained from a single set of chain-termination reactions using Sequenase™ is the resolving power of polyacrylamide gels, not the properties of the enzyme.
The use of Taq DNA polymerase is a more recent addition to the improvements of the Sanger method. Innis et al., Proc. Natl. Acad. Sci. U.S.A. 85:9436 (1988). Taq polymerase is a thermostable enzyme which works efficiently at 70-75° C. The ability to catalyze DNA synthesis at elevated temperature makes Taq polymerase useful for sequencing templates which have extensive secondary structures at 37° C. (the standard temperature used for Klenow and Sequenase™ reactions). Taq polymerase, like Sequenase™, has a high degree of processivity and like Sequenase 2.0, it lacks 3′ to 5′ nuclease activity.
Methods were also developed for examining single base changes without direct sequencing. These methods allow for the “scanning” of DNA fragments for the presence of mutations or other sequence variation. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymorphism [RFLP] analysis).
With the development of these complex and powerful biological techniques, an ambitious project has been undertaken. This project, called the Human Genome Project (HGP), involves the complete characterization of the archetypal human genome sequence which comprises 3×109 DNA nucleotide base pairs. An implicit goal of the project is the recognition that all humans are greater than 99% identical at the DNA sequence level. The differences between people, however, provide the information most relevant to individual health care, including potential estimates of the risk of disease or the response to a specific medical treatment. Upon completion of the HGP, a continuing effort of the human genetics research community will be the examination of differences within populations and of individual variants from the defined archetype. While the 15-year effort of the HGP represents a defined quantity of DNA data acquisition, the future demand for DNA information is tied to individual genetic variation and is, therefore, unlimited.
Current DNA genotyping technologies are adequate for the detailed analysis of samples that range in number from hundreds to thousands per year. Genotyping projects on the order of millions of assays, however, are beyond the capabilities of today's laboratories because of the current inefficiencies in (i) liquid handling of reagent and DNA template solutions, (ii) measurement of solution volumes, (iii) mixing of reagent and template, (iv) controlled thermal reaction of the mixed solutions, (v) sample loading onto an electrophoresis gel, and (vi) DNA product detection on size-separating gels. What is needed is methodology that allows for a high-volume of Li biological reactions without these existing inefficiencies.