Genome projects have determined almost all sequences of genome DNA (chromosome DNA) that covers all genetic information of various organisms including human, mouse, rice, nematoda, yeast and so on. The entire sequence of these genomes is expected to give us information on the primary structure of proteins encoded by genes and information on the expression regulation regions (promoter, enhancer, suppressor etc.) that regulate the expression of the gene. In order to extract these two kinds of information from the genome sequence, the sequence information of mRNA transcribed from the gene locus of chromosome DNA is crucial. In order to analyze the sequence of mRNA, DNA complementary to mRNA (complementary DNA: cDNA) has been usually used. Especially, in order to obtain the foregoing two kinds of information, it is necessary to obtain cDNA (full-length cDNA) synthesized from mRNA that is correctly transcribed from the gene transcription region and contains an entire protein-coding region.
Usually, full-length cDNA must meet two requirements. One is to possess a sequence starting with a transcription start site on the genome DNA. A “cap structure” is added to the 5′ end of mRNA that is properly transcribed from the transcription start site. This cap structure is 7-methylguanosin (m7G) connected to the transcription-start-site nucleotide via 5′-5′ triphosphate linkage. The cDNA complementary to the mRNA possessing this cap structure meets one requirement for full-length cDNA. Another indicator is the presence of a “poly(A) tail” of mRNA. This poly(A) tail is a consecutive sequence of several ten to 200 adenines (A) that is added to the 3′ end of mRNA in the nucleus after transcription of genome DNA. Therefore, cDNA correctly synthesized from a mRNA template possessing both the cap structure at the 5′ end and the poly(A) tail at the 3′ end meets the two requirements for full-length cDNA (starting with a transcription start site and encompassing an entire protein-coding region).
The cDNA can be synthesized by reverse transcriptase reaction using mRNA as a template, but it is difficult to synthesize full-length cDNA, because mRNA transcribed from chromosome DNA is exposed to various degradation reactions in cells or during an extraction process from cells or during a synthesis process to a DNA strand. The reverse transcription reaction using mRNA as a template synthesizes a DNA strand (the first-strand cDNA) toward the 5′ direction of mRNA from a primer oligonucleotide that is annealed with the 3′ end of mRNA. Thus, when the primer (oligo dT) is annealed with a poly(A) tail, it is easy to obtain cDNA covering the poly(A) tail. However, this method does not guarantee the synthesis of full-length cDNA possessing a sequence encompassing from the primer to the cap structure, because degradation of mRNA and/or interruption of synthesis reaction of the DNA strand frequently occur. In fact, most of a vast number of ESTs (expressed sequence tag) reported so far were derived from incomplete cDNAs generated from degraded mRNA or incomplete cDNAs generated by interruption of synthesis reaction of the DNA strand.
Therefore, many methods have been proposed to synthesize full-length cDNA possessing a sequence encompassing to the cap structure that exists at the 5′ end of mRNA. These methods are classified into the following four main cases based on the used principle.
(1) Tailing Method
This method is based on the addition of a homo-oligomer tail using terminal transferase to the first-strand cDNA extended to the cap structure. The Okayama-Berg method (Non-patent Document 1) and the Pruitt method (Non-patent Document 2) are included in this category. Since it is difficult to strictly control the number of the added tail, this method has a problem that too long tailing makes nucleotide sequence analysis difficult.
The template-switching method (Patent Document 1), that uses a dC tail added to the 3′ end of the first-strand cDNA by the terminal transferase activity of reverse transferase, is also included in this tailing method. The number of added dC was described to be 3 to 5 in the reference (Non-patent Document 3).
(2) Linker-ligation Method
This method comprises synthesis of the first-strand cDNA, removal of mRNA by alkaline or RNase H treatment, and ligation of a single-stranded oligonucleotide linker with known sequence to the 3′ end of the single-stranded cDNA using T4 RNA ligase (Non-patent Document 4). This method is inappropriate to prepare a high-quality cDNA library because of formation of the secondary structure in the single-stranded cDNA.
(3) Oligo-capping Method
This method is based on the replacement of the cap structure with an oligomer. The methods using an RNA oligomer (Non-patent Document 5) or a DNA-RNA chimeric oligomer (for example, Patent Document 1 by inventors of this application, Non-patent Document 6) have been reported. This method should produce only full-length cDNAs in principle, but also produces some truncated cDNAs synthesized from degraded mRNAs that are produced during many processes for treating mRNA, and besides a lot of poly(A)+ RNA of about 5-10 μg is necessary. The use of total RNA as a starting material to suppress the degradation of mRNA has been reported to improve the full-length rate to be more than 90%, but the number of reaction steps unchanged (Patent Document 3).
This method includes the method (Patent Document 3) in which a synthetic oligomer was added to the cap structure after opening its carbohydrate ring by periodate oxidation reaction.
(4) Cap-trapping Method
This method is based on selecting mRNAs possessing a cap structure and using them as a template. It includes the method using mRNA selected by anti-cap antibody as a template (Non-patent Document 7) and the method using biotinylated mRNA that is prepared by adding biotin to an open ring generated by periodate oxidation of the carbohydrate of the cap structure and selecting by avidin-immobilized carrier (Non-patent Document 8).    Patent Document 1: U.S. Pat. No. 5,962,272    Patent Document 2: 3337748    Patent Document 3: WO 01/04286    Patent Document 4: U.S. Pat. No. 6,022,715    Non-patent Document 1: Okayama, H. and Berg, P. Mol. Cell. Biol. 2:161-170, 1982.    Non-patent Document 2: Pruitt, S. C. Gene 66:121-134, 1988.    Non-patent Document 3: CLONTECHniques, July 1997, p. 26.    Non-patent Document 4: Edwards, J., Delort, J., and Mallet, J. Nucleic Acids Res. 19:5227-5232, 1991.    Non-patent Document 5: Maruyama, K. and Sugano, S. Gene 138:171-174, 1994.    Non-patent Document 6: Kato et al., Gene 150:243-250, 1994. Non-patent Document 7: Edery, I., Chu, L. L., Sonenberg, N., and Pelletier, J. Mol. Cell. Biol. 15:3363-3371, 1995.    Non-patent Document 8: Caminci et al., Genomics 37:327-336, 1996.