This application claims priority under 35 U.S.C. xc2xa7xc2xa7119 and/or 365 to 316102/1998 filed in Japan on Nov. 6, 1998; the entire content of which is hereby incorporated by reference.
1. Field of the Invention
The present invention relates to a method for producing double-stranded DNA whose terminal homopolymer part has been partly or fully eliminated, and a method for determining nucleotide sequence using double-stranded DNA obtained by the foregoing method.
2. Related Art
In addition to preparation of full-length cDNA libraries, nucleotide sequence determination is also an essential technique for analyzing a gene transcript as a functional unit in the medical and biological fields. For identification of genes and functional analyses thereof, it is extremely important to obtain cDNA clones, of which number is said to be about one hundred of thousands in humans and mice, and to identify their locations on genomes.
A number of methods have been proposed for the preparation of cDNA libraries. The inventors of the present invention proposed a method affording high synthesis efficiency of full-length cDNA, which will be explained below, and are continuing further studies for practical use of the method.
In that method, a first cDNA strand is synthesized by using poly (A) RNA as a template, and oligo dT having a specific restriction enzyme site in the 5xe2x80x2 end sequence and an anchor site in the 3xe2x80x2 end sequence as a primer, and an oligo dG is added to the 3xe2x80x2 end of the obtained first cDNA strand using an enzyme such as terminal deoxynucleotidyl transferase. Then, a second cDNA strand is synthesized by using oligo dC having a restriction site in the 5xe2x80x2 end sequence as a primer. Those strands are introduced into a vector using the restriction sites present at the both ends of the cDNA. The process is elaborated so that only the restriction sites at the both ends should be cleaved during the introduction into the vector by using methylated DCTP instead of DCTP as a substrate for the synthesis of the first cDNA strand, and using a restriction enzyme sensitive to hemimethylation. The detail of the protocol for preparing full length cDNA libraries by such a method is described in WO 98/20122; Carninci, P. et al., Genomics, 37, 327-336 (1996); Carninci, P. et al., DNA Res., 4, 61-66 (1997) and the like, and an enormous number of full length cDNA clones have actually been obtained. Moreover, the analyses of nucleotide sequences of the obtained full length cDNA clones are well under way. The locations of cDNA on genomes are identified by these works.
However, cDNA in such a cDNA library as obtained by the aforementioned method has homopolymers of oligo dA:dT and oligo dG:dC at the both ends. That is, one strand of the double-stranded cDNA excised f rom the vector with a restriction enzyme has oligo dA:dT, and the other has oligo dG:dC (usually 11 to 16 base pairs). The oligo dA: dT originates from the poly A chain, which is originally possessed by mRNA, and the oligo dG:dC originates from a dG tailing, which is introduced during the production of double-stranded DNA as a site to which the primer is attached.
However, it was found that, when the sequencing was performed by using the double-stranded cDNA as a template to determine its nucleotide sequence, the aforementioned homopolymer part inhibited the sequencing. As the sequencing method using double-stranded cDNA as a template, there are the dideoxy method using a DNA polymerase based on the Sanger method, and the transcript sequencing method using an RNA polymerase, which was developed by the inventors of the present invention. Both of these methods determine a nucleotide sequence based on terminal nucleotides and chain lengths of reaction products obtained in the presence of terminators interrupting the chain elongation by polymerase.
However, the presence of a homopolymer part at the end of double-stranded cDNA causes a problem that reading of the sequence becomes to be difficult. The cause of this problem is not necessarily certain. Furthermore, when the double-stranded cDNA is amplified by PCR before the sequencing, the polymerase may fail to correctly reflect the length (repeating nucleotide number) of the homopolymer part in the synthesis. Thus, there may be obtained fragments of different lengths having a certain nucleotide as the terminal, which must have the same lengths. This means that fragments of a certain length contain those having different terminal nucleotides, and it makes the sequencing impossible. In particular, when double-stranded cDNA having oligo dG:dC at one or both ends is used as the template, this tendency is more strongly observed.
It has been turned out that the aforementioned problem is very serious in the nucleotide sequence determination of DNA. Moreover, the aforementioned problem makes it difficult to determine a nucleotide sequence efficiently, rapidly, and accurately for an enormous number of clones in a cDNA library, and it also makes it difficult to identify a location of each gene in a genome.
The above problem may be alleviated to some extent by using oligo dA:dT and oligo dG:dC of a shorter chain length. However, it is not easy to use oligo dA:dT and oligo dG:dC of a shorter chain length in the actual reaction. As mentioned above, the oligo dA:dT originates from the poly A chain originally possessed by mRNA, and therefore it is theoretically possible to make the oligo dA:dT shorter by using a shorter oligo dT in the primer used for the transcription reaction. However, a shorter oligo dT in the primer may disadvantageously reduce the yield of transcription products, and thus it is not practical. On the other hand, the oligo dG:dC originates from a dG tailing, which is introduced during the production of double-stranded DNA as a site to which a primer is attached, and therefore it is also theoretically possible to make the oligo dG:dC shorter by using a shorter oligo dG tailing. However, it is also actually difficult to control the chain length of the dG tailing within a desired range.
Therefore, the object of the present invention is to provide a method for preparing double-stranded cDNA whose terminal homopolymer part, which may inhibit the nucleotide sequence determination, has been partly or fully eliminated.
A further object of the present invention is to provide a method for determining nucleotide sequence using as a template double-stranded DNA obtained by the foregoing method, i.e., double-stranded cDNA whose terminal homopolymer part or parts has been partly or fully eliminated.
The present invention relates to a method for producing double-stranded DNA comprising treating double-stranded DNA having a homopolymer part or parts at one or both ends with a restriction enzyme to partly or fully eliminate at least one of the homopolymer part or parts, wherein the restriction enzyme is capable of cleaving double-stranded DNA at a cleavage site separate from a recognition site therefor.
The present invention further relates to a method for determining a nucleotide sequence of double-stranded DNA utilizing one or both strands of the double-stranded DNA as a template, wherein the double-stranded DNA used as the template is double-stranded DNA prepared by the aforementioned method of the present invention.
According to the present invention, a nucleotide sequence of cDNA clone can be accurately determined by eliminating a part or whole of homopolymer that exists at a DNA terminus.