1. Field of the Invention
The present invention relates to a method of identifying the base sequence of a nucleic acid by using a DNA chip for DNA diagnosis and medical treatment.
2. Related Background Art
One of the techniques for sequencing a nucleic acid, etc. or for detecting the sequence is to utilize a DNA array. U.S. Pat. No. 5,445,934 discloses a DNA array where 100,000 or more oligonucleotide probes are bonded in 1 inch square. Such a DNA array has an advantage in that many characteristics can be examined at the same time with a very small sample amount. When a fluorescence-labeled sample is poured onto such a DNA chip, DNA fragments in the sample bind to probes having a complementary sequence fixed on the DNA chip, and only that part can be discriminated by fluorescence to elucidate the sequence of the DNA fragment in the DNA sample.
Sequencing By Hybridization (SBH) is a method for examining the base sequence utilizing such a DNA array and the details are described in U.S. Pat. No. 5,202,231. In the SBH method, all possible sequences of an oligonucleotide of a certain length are arranged on the substrate, then fully matched hybrids formed by a hybridization reaction between probes and the sample DNA are detected. If a set of fully matched hybrids is obtained, the set will give an assembly of overlapping sequences with one base shift being a part of one certain sequence, of which analysis will elucidate that sequence.
In principle, in order to examine whether or not a certain sequence is present in a DNA specimen, a hybridization reaction is carried out with a prove having a complementary sequence, and the presence or absence of hybridization is detected. In practice, however, it is very difficult to judge the presence or absence of one sequence by using one complementary probe and hybridization, because even when fully matched hybrids are compared, the fluorescence intensities of the hybrids differ from each other according to their sequence. In particular, GC content in the sequence greatly affects the stability of the hybrid. Further, sequences not exactly complementary but containing one base mismatch also form a hybrid to emit fluorescence. Such a hybrid has lower stability and weaker fluorescence compared with a fully matched hybrid of the same sequence, but it is often observed that such a mismatch hybrid has a stronger fluorescence than a full-matched hybrid of a different sequence. In addition, the stability of one mismatch hybrid greatly varies according to the location of the mismatch in the hybrid. When the mismatch is located at the terminus, a relatively stable hybrid is obtained. When the mismatch is located at the center of the hybrid, the hybrid becomes unstable because the consecutiveness of the complementary strand is broken. Thus, at present, various factors are participating in the hybrid stability, and the absolute value (standard value) for the fluorescence intensity, to judge whether or not the hybrid is full matched, is not obtained. Also, conditions for detecting the fluorescence solely from the full matched hybrid, eliminating fluorescence from one-base mismatched hybrids, have not been determined.
In order to eliminate the difference of the hybrid stability due to the sequence, a method using tetramethylammonium chloride is described in Proc. Natl. Acad. Sci. USA Vol. 82, pp. 1585-1588 (1985). However, the above-described problems have not been solved perfectly.
A method for judging whether a hybrid is a perfect match is described in Science vol. 274 p. 610-614, 1996, in which a 15-mer oligonucleotide probe and 15-mer oligonucleotides having the same sequence except for one mismatching base at the center of the sequence are prepared. The fluorescence intensity of the hybrid with the probe (perfect match) is compared with those of hybrids with other one-base mismatching oligonucleotides. Only when the intensity of the perfect match is stronger, it is judged positive.
Based on the method above, U.S. Pat. No. 5,733,729 discloses a method using a computer for a more accurate calling, where the fluorescence intensities of the hybrids are compared by using a computer to know the base sequence of a sample.
In these methods, it is necessary to locate the subject nucleotide to be examined in the center of a probe and to prepare a set of four probes each having one of four bases at the position. It is also necessary to prepare such a probe set for each of the overlapping sequences with one base shift. As described above, they use 15-mer oligonucleotides and determine the perfect match by comparing with other three types of probes having one-base mismatch at the center. It is said that more accuracy can be obtained by evaluating the stability of the hybrids theoretically or empirically. In addition, if the base length of the region to be examined is L, the number of probes will be 4×L (e.g., 20 probes for 5 bases).
Although the above-described methods using mismatches are excellent in that the call is relatively easy by comparing with one-base mismatches at the same position of the same sequence and that the number of probes may be small (in SBH, 1024 types of probes are required for the similar analyses), they have significant defects in that accurate information cannot be obtained when there are two base mismatches in the same region or when there is a base deletion or insertion.
On the other hand, the SBH method may solve the above-described problems and in principle, it may cope with any variation. A call, however, is rather difficult, because the intensity of a one-base mismatch in one sequence is stronger than that of a full match in another sequence and because stability of the hybrid differs considerably according to the position of the mismatch in the sequence even if it is an one-base mismatch. As a result, a full match, one-base and two-base mismatches (continuous or discontinuous) cannot be simply called from the fluorescence intensities. Accordingly, complex analyses, including theoretical predictions, comparison between individual sequences and accumulation of empirical parameters, are required.
Furthermore, in order to determine the sequence of a gene by reading fluorescence intensities of hybrids for each probe followed by data analysis, a large-scale computer system as well as a detector for reading arrays are required. This is a big obstacle in the way of simple gene diagnosis using the DNA array.