The present invention relates to the determination of the sequences of polymers immobilized to a substrate. In particular, one embodiment of the invention provides a method and apparatus for sequencing many nucleic acid sequences immobilized at distinct locations on a matrix surface. The principles and apparatus of the present invention may be used, for example, also in the determination of sequences of peptides, polypeptides, oligonucleotides, nucleic acids, oligosaccharides, phospholipids and other biological polymers. It is especially useful for determining the sequences of nucleic acids and proteins.
The structure and function of biological molecules are closely interrelated. The structure of a biological polymer, typically a macromolecule, is generally determined by its monomer sequence. For this reason, biochemists historically have been interested in the sequence characterization of biological macromolecule polymers. With the advent of molecular biology, the relationship between a protein sequence and its corresponding encoding gene sequence is well understood. Thus, characterization of the sequence of a nucleic acid encoding a protein has become very important.
Partly for this reason, the development of technologies providing the capability for sequencing enormous amounts of DNA has received great interest. Technologies for this capability are necessary for, for example, the successful completion of the human genome sequencing project. Structural characterization of biopolymers is very important for further progress in many areas of molecular and cell biology.
While sequencing of macromolecules has become extremely important, many aspects of these technologies have not advanced significantly over the past decade. For example, in the protein sequencing technologies being applied today the Edman degradation methods are still being used. See, e.g., Knight (1989) “Microsequencers for Proteins and Oligosaccharides,” Bio/Technol. 7:1075–1076. Although advanced instrumentation for protein sequencing has been developed, see, e.g., Frank et al. (1989) “Automation of DNA Sequencing Reactions and Related Techniques: A Work Station for Micromanipulation of Liquids,” Bio/Technol. 6:1211–1213, this technology utilizes a homogeneous and isolated protein sample for determination of removed residues from that homogeneous sample.
Likewise, in nucleic acid sequencing technology, three major methods for sequencing have been developed, of which two are commonly used today. See, e.g., Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d Ed.) Vols. 1–3, Cold Spring Harbor Press, New York, which is hereby incorporated herein by reference. The first method was developed by Maxam and Gilbert. See, e.g., Maxam and Gilbert (1980) “Sequencing End-Labeled DNA with Base-Specific Chemical Cleavages,” Methods in Enzymol. 65:499–560, which is hereby incorporated herein by reference. The polymer is chemically cleaved with a series of base-specific cleavage reagents thereby generating a series of fragments of various lengths. The various fragments, each resulting from a cleavage at a specific base, are run in parallel on a slab gel which resolves nucleic acids which differ in length by single nucleotides. A protein specific label allows detection of cleavages at all nucleotides relative to the position of the label.
This separation requires high resolution electrophoresis or some other system for separating nucleic acids of very similar size. Thus, the target nucleic acid to be sequenced must usually be initially purified to near homogeneity.
Sanger and Coulson devised two alternative methods for nucleic acid sequencing. The first method, known as the plus and minus method, is described in Sanger and Coulson (1975) J. Mol. Biol. 94:441–448, and has been replaced by the second method. Subsequently, Sanger and Coulson developed another improved sequencing method known as the dideoxy chain termination method. See, e.g., Sanger et al. (1977) “DNA Sequencing with Chain-Termination Inhibitors,” Proc. Natl. Acad. Sci. USA 74:5463–5467, which is hereby incorporated herein by reference. This method is based on the inability of 2′, 3′ dideoxy nucleotides to be elongated by a polymerase because of the absence of a 3′ hydroxyl group on the sugar ring, thus resulting in chain termination. Each of the separate chain terminating nucleotides are incorporated by a DNA polymerase, and the resulting terminated fragment is known to end with the corresponding dideoxy nucleotide. However, both of the Sanger and Coulson sequencing techniques usually require isolation and purification of the nucleic acid to be sequenced and separation of nucleic acid molecules differing in length by single nucleotides.
Both the polypeptide sequencing technology and the oligonucleotide sequencing technologies described above suffer from the requirement to isolate and work with distinct homogeneous molecules in each determination.
In the polypeptide technology, the terminal amino acid is sequentially removed and analyzed. However, the analysis is dependent upon only one single amino acid being removed, thus requiring the polypeptide to be homogeneous.
In the case of nucleic acid sequencing, the present techniques typically utilize very high resolution polyacrylamide gel electrophoresis. This high resolution separation uses both highly toxic acrylamide for the separation of the resulting molecules and usually very high voltages in running the electrophoresis. Both the purification and isolation techniques are highly tedious, time consuming and expensive processes.
Thus, a need exists for the capability to simultaneously sequence many biological polymers without individual isolation and purification. Moreover, dispensing with the need to individually perform the high resolution separation of related molecules leads to greater safety, speed, and reliability. The present invention solves these and many other problems.