Comparison of nucleic acids at the molecular level currently is based on three general types of sequence comparisons: (i) "wet" comparisons where nucleic acid probe fragments are hybridized to target nucleic acids under varying degrees of stringency; (ii) "dry" comparisons where a fragment having a known sequence is compared via computer against a database containing known sequences; and (iii) "restriction fingerprint" comparisons where a DNA fragment is cut by restriction endonucleases to obtain a specific set of fragments whose lengths are compared against other sets of cleaved fragments obtained experimentally or computed from known sequences.
A fourth kind of comparison, generally referred to as "hybridization fingerprinting," is performed by hybridizing a set of short oligomer probes with a target DNA fragment, identifying complementary oligomers that occur within the fragment. Two versions of this approach have been proposed. The first method counts common occurrences of oligomers in the fragment and in a candidate matching sequence (Lennon and Lehrach, 1991; Drmanac et al., 1991). The second method is based on oligomer overlaps and a comparison of the reconstructed sequence against a candidate matching sequence (Drmanac et al., 1991). The first method, by design, ignores shared subwords (overlaps) between the oligomers, while the second method utilizes shared subwords for the purpose of sequence reconstruction, which is an intermediate step in the recognition of similarity.
In contrast to gel-based sequencing and restriction analysis, which are essentially one-dimensional separation experiments, hybridization experiments do not require one-dimensional separation and thus can be economically conducted on a much larger scale by utilizing high-density two-dimensional arrays of immobilized DNA fragments (Format 1 hybridization experiments) or oligomer probes (Format 2 experiments). This enhances the opportunity to automate the process and, therefore, increase the amount of information generated within a given time period. Furthermore, automation can lead to increased cost efficiency. For example, data collection throughput of several million probe/target hybridization scores per day can be achieved in a laboratory of small size by utilizing current hybridization technology (Drmanac et al., 1994a; Drmanac et al., 1993).
An example of hybridization-based technology is provided in U.S. Pat. No. 5,202,231. There, a method is described for sequencing based on hybridization of sets oligonucleotide probes and compilation of overlapping, completely complementary probes to generate a sequence. The shortcoming of this approach is that the initial reliance on overlap may introduce significant error into the sequence. While providing a simple and potentially automated procedure, this method is unsatisfactory in terms of accuracy and confidence. Thus, there remains a need for more sophisticated, hybridization-based techniques for the comparison of nucleic acids.