1. Field of the Invention
The present invention relates to a method for DNA analysis based on a complementary extension reaction using a DNA polymerase, and a method for preparation of a DNA sample which can be efficiently used for the DNA sequencing method, as well as a reagent kit for use in therein.
2. Description of the Related Art
With the progress of the human genome projects, a high throughput and highly efficient DNA sequencing technology is required. Conventional DNA sequencing methods involve labeling DNA fragments with a radioisotope and manually determining the size of DNA fragments by gel electrophoresis. In place of such manual DNA sequencing methods, there have been widely employed automated devices (fluorescent DNA sequencers) for an optical detection of DNA fragments through exposure to light during gel electrophoresis using fluorescent labeling of DNA. These devices are used for a DNA sequencing method which comprises hybridizing oligonucleotides called primers, with a target DNA to be sequenced. Then, DNA fragments are prepared having various lengths for use in the DNA sequencing method by a complementary strand extension reaction using a DNA polymerase. Finally, the size of DNA fragments is determined by gel electrophoresis. This DNA sequencing is called the Sanger sequencing method or dideoxy sequencing method. In this method, the size of DNA sequenced in a single operation depends on the separation activity on a gel, however, the DNA sequence base length is in the range of 400 to 700. Sequencing of DNAs over several Kbp is labor intensive and time consuming work.
For sequencing of long DNAs over several Kbp, a shotgun strategy has been employed heretofore. According to the shotgun strategy, a sample DNA is randomly digested by means of ultrasonic vibration, the resulting DNA fragments are cloned into E. coli and cultivated to make colonies. Then, E. coli in each colony is cultivated to increase the copy number of the DNA. Thereafter, the sample DNA is extracted and provided for analysis. In the shotgun method, DNAs in each colony contain DNA fragments of the sample DNA. The portion of the sample DNA which corresponds to the DNA fragments is unidentified until sequencing is completed. Therefore, DNA fragments corresponding to DNA fragment lengths 10 to 20 times longer than the target length, should be analyzed. For this reason, much time and labor is required causing a serious obstacle.
DNA sequencing starts with the preparation of a DNA library which covers all DNAs, and making clones having a length of 10 Kbp to 100 Kbp from the DNAs present in the genome. In the actual sequencing, each clone is further digested to make subclones having a size sufficient to permit analysis with a DNA sequencer. The subclones are then sequenced again. Finally, the DNA fragments sequenced are reconstituted to obtain the intact overall DNA sequence. The method described above is now widespread because of its simplicity.
As is currently observed in the human genome projects, however, the shotgun strategy is not necessarily the best approach for large scale DNA sequencing in view of throughput and automation (SURI KAGAKU, No. 359, May, (1993) pp. 74-81). This is because it is complicated and troublesome to prepare subclones prior to measurements with DNA sequencers. Heretofore, subclones have been prepared by randomly digesting huge DNAs by sonication (Molecular Cloning, second edition, Cold Spring Harbor Laboratory Press (1989), pp. 13.21-13.23). The subclones are cloned into E. coli which is cultivated and the colonies obtained having the desired DNA fragments, were selected. Then, using plasmids carrying the selected DNA fragments, DNA sequencing is performed for every colony. The base length of DNA which can be determined by single DNA sequencing ranges generally from 300 to 500 bases. It is required that a number of subclones be analyzed.
Even though colonies are used, there are many colonies containing the same DNA fragment portion so that overlapping portions of the DNA sequence must be sequenced. For this reason, it is necessary to analyze a base length 10 to 20 times longer than the length of DNA to be actually sequenced. For example, more than 400 plasmids (subclones) should be analyzed to sequence a DNA having a length of 10 Kbp, however, it is impossible to select subclones in such a manner that sequencing information does not overlap with each other. In addition, subclones are prepared utilizing the E. coli host-vector system and hence, operations are complicated and not suitable for automation.
The primer walking method (Science, 258 (1992), pp. 1787-1791; Proc. Natl. Acad. Sci. U.S.A., 86, 6917-6921 (1989)) does not include overlapping sequences of the same base sequence. According to the primer walking method, a huge DNA is employed as a sample DNA, in its intact form. First, a part of the sample DNA is used to determine its base sequence. Next, based on the thus determined DNA sequence, a primer capable of specifically hybridizing with the portion for which the sequence has been determined, is synthesized to determine the DNA sequence of a contiguous portion. That is, in the primer walking method, the base sequence of a DNA fragment to be sequenced is sequentially determined one by one from the terminus thereof. However, the primer walking method requires that primers be synthesized for every sequencing, even though overlapping portions for DNA sequencing are minimized. In addition, the sequencing operations of the primer walking method are sequential so that the method is not always suitable for large scale sequencing.
In order to solve the problems of complicated operations associated with cloning or primer walking, various attempts have been made. In particular, direct sequencing of DNA fragments in mixture form obtained from a sample DNA digested with a restriction enzyme (DNA Research, 1 (1994) pp 231-237) is a promising method, which is briefly explained below. A known sequence oligonucleotide is ligated to a DNA fragment at the terminus thereof, to produce the priming site of each DNA digested with a restriction enzyme. Then sequencing is conducted using a set of primers which can discriminate a restriction cutter recognition sequence from the sequence adjacent to the cutting site (1 to 4 bases). The primer set includes, for example, 16 combinations of all DNA sequences in the case of an unknown base sequence having variable two base sequence at the 3' terminus. In the case of approximately three types of double stranded DNA fragments (6 types in terms of DNA terminus), the base sequence of each DNA fragment can be determined directly from the mixture, using the set of primers described above. After the base sequence of each DNA fragment has been determined, the base sequences of the respective DNA fragments are reconstructed to obtain the overall base sequence. In order to obtain the overall base sequence, a primer having the same base sequence as that of each DNA fragment around the 3' terminus is synthesized, and intact DNA is used as a template for sequencing so that the base sequence between one DNA fragment and another DNA fragment is determined. This determines how the respective DNA fragments are ligated with each other. Alternatively, the base sequence of a DNA fragment digested with another restriction enzyme is determined. With overlapping base sequences as guides, the relationship of one fragment to another is determined.