The present invention relates to the field of nucleic acid analysis, detection, and sequencing. More specifically, in one embodiment the invention provides improved techniques for synthesizing arrays of nucleic acids, hybridizing nucleic acids, detecting mismatches in a double-stranded nucleic acid composed of a single-stranded probe and a target nucleic acid, and determining the sequence of DNA or RNA or other polymers.
It is important in many fields to determine the sequence of nucleic acids because, for example, nucleic acids encode the enzymes, structural proteins, and other effectors of biological functions. In addition to segments of nucleic acids that encode polypeptides, there are many nucleic acid sequences involved in control and regulation of gene expression.
The human genome project is one example of a project using nucleic acid sequencing techniques. This project is directed toward determining the complete sequence of the genome of the human organism. Although such a sequence would not necessarily correspond to the sequence of any specific individual, it will provide significant information as to the general organization and specific sequences contained within genomic segments from particular individuals. The human genome project will also provide mapping information useful for further detailed studies.
The need for highly rapid, accurate, and inexpensive sequencing technology is nowhere more apparent than in a demanding sequencing project such as the human genome project. To complete the sequencing of a human genome will require the determination of approximately 3.times.10.sup.9, or 3 billion, base pairs.
The procedures typically used today for sequencing include the methods described in Sanger et al., Proc. Natl. Acad. Sci. USA (1977) 74:5463-5467, and Maxam et al., Methods in Enzymology (1980) 65:499-559. The Sanger method utilizes enzymatic elongation with chain terminating dideoxy nucleotides. The Maxam and Gilbert method uses chemical reactions exhibiting base-specific cleavage reactions. Both methods require a large number of complex manipulations, such as isolation of homogeneous DNA fragments, elaborate and tedious preparation of samples, preparation of a separating gel, application of samples to the gel, electrophoresing the samples on the gel, working up of the finished gel, and analysis of the results of the procedure.
Alternative techniques have been proposed for sequencing a nucleic acid. PCT patent Publication No. 92/10588, incorporated herein by reference for all purposes, describes one improved technique in which the sequence of a labeled, target nucleic acid is determined by hybridization to an array of nucleic acid probes on a substrate. Each probe is located at a positionally distinguishable location on the substrate. When the labeled target is exposed to the substrate, it binds at locations that contain complementary nucleotide sequences. Through knowledge of the sequence of the probes at the binding locations, one can determine the nucleotide sequence of the target nucleic acid. The technique is particularly efficient when very large arrays of nuleic acid probes are utilized. Such arrays can be formed according to the techniques described in U.S. Pat. No. 5,143,854 issued to Pirrung et al. See also U.S. application Ser. No. 07/805,727, both incorporated herein by reference for all purposes.
When the nucleic acid probes are of a length shorter than the target, one can employ a reconstruction technique to determine the sequence of the larger target based on affinity data from the shorter probes. See U.S. Pat. No. 5,202,231 to Drmanac et al., and PCT patent Publication No. 89/10977 to Southern. One technique for overcoming this difficulty has been termed sequencing by hybridization or SBH. For example, assume that a 12-mer target DNA 5'-AGCCTAGCTGAA is mixed with an array of all octanucleotide probes. If the target binds only to those probes having an exactly complementary nucleotide sequence, only five of the 65,536 octamer probes (3'-TCGGATCG, CGGATCGA, GGATCGAC, GATCGACT, and ATCGACTT) will hybridize to the target. Alignment of the overlapping sequences from the hybridizing probes reconstructs the complement of the original 12-mer target:
TCGGATCG - CGGATCGA - GGATCGAC - GATCGACT - ATCGACTT - TCGGATCGACTT
While meeting with much optimism, prior techniques have also met with certain limitations. For example, practitioners have encountered substantial difficulty in analyzing probe arrays hybridized to a target nucleic acid due to the hybridization of partially mismatched sequences, among ocher difficulties. The present invention provides significant advances in sequencing with such arrays.