1. Field of the Invention
The present invention relates to a method for synthesizing cDNA from a mRNA sample and to tobacco acid pyrophosphatase for use in the method; more specifically, the invention relates to a rapid synthesis method of cDNA including the 5xe2x80x2-terminal region of mRNA in a sample for the analysis of a nucleotide sequence derived from the 5xe2x80x2-terminus of the mRNA.
2. Background of the Invention
Numerous types of proteins composing cell are present, such as proteins involved in cell morphology, proteins involved in development or proteins involved in metabolism. The patterns of the presence undoubtedly determine the properties of cell. Essential information relating to the mode of these presence and functions is imprinted in the gene in cell and is realized by mRNA as a copy of the gene. mRNA functions as template for protein translation and also as a carrier of the information flow from DNA to protein. Ultimately, mRNA reflects the xe2x80x9cphenotypesxe2x80x9d in all biological organisms. These proteins are industrially valuable and possibly applicable as pharmaceutical drugs, diagnostic agents, bio-sensors and bio-reactors, provided that these proteins are biologically active substances. Hence, it is very important to recover full-length mRNA and procure gene information from the mRNA. Recent progress in the gene recombinant technology and more recent promotion in the genome analysis project are now permitting cDNA cloning and analysis technology readily usable.
Although rapid analysis of complete 5xe2x80x3-terminal sequence of mRNA has increasingly been demanded in recent years, no technology has been established yet to enable such rapid analysis in a simple and rapid fashion. Because the 5xe2x80x2-terminal sequence of full-length mRNA contains a transcription start for gene expression analysis on genome, rapid analysis of the 5xe2x80x2-terminal sequence as well as enormous quantities of sequenced genome open up a way for transcription gene mapping. Furthermore, accurate information of the 5xe2x80x2-terminal sequence of mRNA can identify the sequences of gene expression regulatory promoters present upstream. These promoters are cis-factors regulating when, where and how much a gene should be expressed. The detection of the 5xe2x80x2-terminal sequence of mRNA verifies that an upstream promoter sequence is functional, which suggests a new possibility for the etiological analysis or diagnosis or therapeutic treatment of diseases.
Practically, the information as to when, where and how much a gene is expressed is very valuable information for the etiological analysis or diagnosis or therapeutic treatment of diseases. The Human Genome Project currently promoted internationally mentions as one of the goals to collect such information. The ultimate purpose of the Project lies in the nucleotide sequencing of biological genome. The nucleotide sequences of several bacterial genome species and the nucleotide sequences in the whole genome of budding yeast have already been sequenced and reported. Most of many genes identified on the isolated genome species are functionally not yet identified, which is a big issue in future. In that sense, the significance of the analysis of cDNA reflecting the gene expression dynamics in cell is increasingly drawing attention.
Herein, by the term cDNA referred to as complementary DNA is meant DNA synthetically prepared by reverse transcriptase using mRNA as template. In other words, the information of mRNA encoding the information of the amino acid sequence of protein is synthetically constructed as cDNA. The analysis of the cDNA can readily determine the primary structure of the protein and can readily promote the development of a large-scale expression system. Thus, such cDNA preparation is now very important, industrially.
Ideally, the ultimate goal of the cDNA cloning technology lies in the replacement of all expressed mRNAs with complete cDNAs. Thus, the information is greatly valuable. In other words, the information recovered from such full-length cDNA serves as a starting point for the analysis of the information on genome, because the information includes the information of transcription start and the entire information of expressed protein. The primary protein sequence recovered from a complete coding sequence distinctively shortens the time required for the functional analysis.
However, the technology for the recovery of cDNA including full-length mRNA has been a not-yet matured technology xe2x80x9cstill under way of developmentxe2x80x9d among the DNA technologies in rapid progress. For example, the Gubler-Hoffman method (Gene, Vol. 25, pp. 236-269, 1983) is known as one of synthesis method of cDNAs commonly applied conventionally. Nevertheless, many of cDNAs synthesized by the method are incomplete with terminal deficiency. Alternatively, the Okayama-Berg method (Mol. Cell. Biol., Vol. 2, pp. 161-170, 1982) is a synthesis method characteristic in that full-length cDNA is readily prepared. Even by the method, however, reverse transcription sometimes stops in the course of cDNA synthesis, so no guarantee is given to the resulting cDNA that it is of full length.
The RACE method (Rapid amplification of cDNA ends: Proc. Natl. Aca. Sci. USA, Vol. 85, pp. 8998-9002, 1988) has been suggested as a method to supplement a portion lacking in cDNA, based on the partial cDNA sequences recovered by the existing methods, so as to acquire the complete information of mRNA. The method comprises reverse transcription based on a target cDNA sequence to add a homopolymer to both the ends of cDNA by terminal transferase or to ligate an adapter comprising a synthesized DNA to both the ends of cDNA by T4 DNA ligase, and polymerase chain reaction (PCR) based on these added sequences and a primer specific to the target cDNA, thereby analyzing only the terminal regions of mRNA sequence.
The analysis of the target 5xe2x80x2-terminus of mRNA in particular by the method (referred to as 5xe2x80x2-RACE) can be done in a very simple fashion, because PCR is utilized by the method. Accordingly, the method is frequently used. Principally, however, the method apparently cannot analyze the 5xe2x80x2-terminal sequence of mRNA used for the preparation of the cDNA, although the method can analyze the 5xe2x80x2-terminus of cDNA. Hence, the recovery of complete 5xe2x80x2-terminus of mRNA is very difficult, compared with the recovery of complete 3xe2x80x2-terminus by 3xe2x80x2-RACE, in which poly-A sequence is responsible for the protection role against terminal deficiency. As described above, even currently, the method is acclaimed as a xe2x80x9cnot-yet established technologyxe2x80x9d.
It is known that the 5xe2x80x2-terminus of complete mRNA has a characteristic structure called cap structure (Nature, Vol. 253, pp. 374-375, 1975). An attempt has been suggested to analyze cDNA, targeting the vicinity of the cap structure (Japanese Patent Laid-open No. 6-153953 (1994); Gene, Vol. 138, pp. 171-174, 1994).
According to these methods, tobacco acid pyrophosphatase (referred to as xe2x80x9cTAPxe2x80x9d hereinafter) specifically cleaving the cap structure is used. These methods comprise treating mRNA with alkali phosphatase to remove the phosphate group from the 5xe2x80x2-terminus of mRNA without any cap, subsequently treating the resulting mRNA with TAP to cleave the cap, adding an oligoribonucleotide and continuously effecting reverse transcription, to synthesize cDNA. Although these methods are complicated because enzymatic reactions continue over plural steps, these methods are a few effective methods principally capable of specifically analyzing full-length mRNA. Nevertheless, these methods include problems to be improved in the steps. Currently, therefore, these methods are not commonly widespread, although these methods are greatly needed due to the significance of the 5xe2x80x2-terminal sequencing as described above.
It is an object of the present invention to provide a synthesis method of cDNA from mRNA, so as to recover the complete 5xe2x80x2-terminal sequence of cDNA at a large-scale in a rapid manner by selectively synthesizing cDNA including the 5xe2x80x2-terminal sequence of full-length mRNA with the cap structure. It is another object of the present invention to provide tobacco acid pyrophosphatase preferable for use in the synthesis method.
The present invention proposes to attain the above-mentioned objective by suggesting a DNA synthesis method for synthesizing cDNA including the 5xe2x80x2-terminal sequence of full-length mRNA with a cap structure from a mRNA sample containing the full-length mRNA with the cap structure and non-full-length mRNA without any cap structure in mixture, said method comprising:
a first step of removing the phosphate group at the 5xe2x80x2-terminus of the non-full-length mRNA in the mRNA sample;
a second step of removing the cap structure at the 5xe2x80x2-terminus of the full-length mRNA in the mRNA sample;
a third step of ligating an oligonucleotide of a predetermined sequence to the phosphate group at the 5xe2x80x2-terminus of the mRNA generated through the first and second steps in the sample; and
a fourth step of subjecting the mRNA ligated with the oligonucleotide at the phosphate group at the 5xe2x80x2-terminus to a reverse transcriptase process using as primer a short-chain oligonucleotide capable of being annealed to an intermediate sequence within the mRNA, to synthesize a first-strand cDNA;
characterized in that said oligoribonucleotide for use at the third step has a sequence recovered by preparing a number of oligoribonucleotide sequences including various combinations of bases in a predetermined number, carrying out a homology search with a predetermined nucleotide sequence data base to determine the occurrence number of a sequence completely matching or differing by one base, and preparing a combination of plural sequences in a low-frequency occurrence group including a sequence at the lowest occurrence number.
According to a preferred embodiment of the present invention, the third step comprises ligating an oligoribonucleotide of a predetermined sequence to the phosphate group.
According to another embodiment of the present invention, the third step comprises ligating an oligoribonucleotide composed of a sequence never contained in the sequence of the mRNA in the mRNA sample to the phosphate group.
Specifically, as the oligonucleotide, use is made of an oligoribonucleotide comprising a 10-base or longer sequence never contained in the sequence of the mRNA. More specifically, a great number of oligonucleotide sequences are prepared, the oligonucleotide sequences comprising various combinations of oligonucleotides of bases in a predetermined number; a homology search of each of the oligonucleotide sequences with a predetermined nucleotide sequence data base is then carried out; the occurrence number of a sequence completely matching or differing by one base is determined; by using combinations of plural sequences in the low occurrence frequency group including a sequence of the lowest occurrence frequency, a sequence is determined and an oligoribonucleotide of this sequence is used. In one embodiment of the invention, any one of the following oligoribonucleotides is used as such oligoribonucleotide.
5xe2x80x2-GUUGCGUUAC-ACAGCGUAUG-AUGCGUAAGG-3xe2x80x2
5xe2x80x2-GUUGCGUUAC-ACAGCGUAUG-AUGCGUAA-3xe2x80x2
5xe2x80x2-GUUGCGUUAC-ACAGCGUAUG-AUGCGU-3xe2x80x2
5xe2x80x2-AAGGUACGCC-GUUGCGUUAC-ACAGCGUAUG-AUGCGU-3xe2x80x2
5xe2x80x2-AAGGUACGCC-GUUGCGUUAC-ACAGCGUAUG-AUGCGUAA-3xe2x80x2
5xe2x80x2-GUUGCGUUAC-AAGGUACGCC-ACAGCGUAUG-AUGCGU-3xe2x80x2
5xe2x80x2-GUUGCGUUAC-AAGGUACGCC-ACAGCGUAUG-AUGCGUAA-3xe2x80x2
According to still another embodiment of the present invention, the primer to be used at the fourth step is a short-chain oligonucleotide of a length of 6 bases or longer.
According to still another embodiment of the present invention, the cap structure at the 5-terminus of the full-length mRNA in the mRNA sample is removed by using tobacco acid pyrophosphatase purified to a high purity with no contamination of trace amounts of nuclease cleaving the phosphodiester bond comprising RNA and a phosphatase removing 5-phosphate group freshly generated after cap cleavage.
The present invention also proposes a method for synthesizing cDNA including the 5xe2x80x2-terminal sequence of full-length mRNA with a cap structure from a mRNA sample containing the full-length mRNA with the cap structure and non-full-length mRNA without any cap structure in mixture, the method comprising:
a first step of removing the phosphate group at the 5xe2x80x2-terminus of the non-full-length mRNA in the mRNA sample;
a second step of removing the cap structure at the 5xe2x80x2-terminus of the full-length mRNA in the mRNA sample by using tobacco acid pyrophosphatase highly purified by using alkali phosphatase;
a third step of ligating an oligoribonucleotide of a predetermined sequence to the phosphate group at the 5xe2x80x2-terminus of mRNA generated through the first and second steps in the sample, said oligoribonucleotide comprising a sequence never contained in the sequence of mRNA in the mRNA sample;
a fourth step of subjecting the mRNA ligated with the oligoribonucleotide at the phosphate group at the 5xe2x80x2-terminus to a reverse transcriptase process using as primer a short-chain oligonucleotide of 6 bases or more in length and with an ability being annealed to an intermediate sequence within the mRNA, to synthesize a first-strand cDNA; and
a fifth step of synthesizing a second-strand cDNA based on the resulting first-strand cDNA.
As the tobacco acid pyrophosphatase for use in the cDNA synthesis method of the present invention, it is preferable to use the tobacco acid pyrophosphatase which can remove the cap structure at the 5xe2x80x2-terminus and has already been purified at an extent such that the tobacco acid pyrophosphatase substantially never contains other enzymes cleaving the remaining sites within mRNA.
In accordance with the method of the present invention, only cDNA containing the 5xe2x80x2-terminal sequence of full-length mRNA with the cap structure is synthesized from a mRNA sample containing the full-length mRNA and non-full-length mRNA without the cap structure in mixture. Accordingly, it is preferable to preliminarily remove the phosphate group at the 5xe2x80x2-terminus of the non-full-length mRNA in the sample, thereby avoiding the occurrence of the additional reaction of oligonucleotide (preferably oligoribonucleotide) at the third step.
As to the full-length mRNA in the sample, it is preferable to remove the cap structure at the 5xe2x80x2-terminus of the full-length mRNA in the mRNA sample at the second step and then, in the third step, an oligonucleotide is ligated to the phosphate group thus generated at the fresh 5xe2x80x2-terminus of the mRNA. Prior to this reaction, the phosphate group at the 5xe2x80x2-terminus of the non-full-length mRNA is already removed. Thus, the additional reaction never progresses in the non-full-length mRNA.
As the oligonucleotide, use is preferably made of oligoribonucleotide, because the reaction efficiency of an enzyme T4 RNA ligase when used differs in the order of two digits between substrates RNA and DNA.
At the fourth step, subsequently, mRNA with the oligonucleotide ligated at the phosphate group at the 5xe2x80x2-terminus thereof is subjected to a reverse transcriptase process using as primer a short-chain oligonucleotide to be annealed to an intermediate sequence within the mRNA. Thus, a complementary first-strand cDNA is synthesized. In such manner, cDNA can be synthesized readily, starting from the 5xe2x80x2-terminus of mRNA.
Preferably, the fifth step is satisfactrrily added to synthesize a double-stranded cDNA from a single-stranded cDNA. The fifth step comprises additionally synthesizing a second-strand cDNA from the resulting first-strand cDNA.
One characteristic aspect of the present invention lies in the use as primer of a short-chain oligonudeotide (xe2x80x9crandom hexamerxe2x80x9d of 6 bases, in particular, in accordance with the present invention) capable of being annealed to an intermediate sequence within mRNA, preferably a sequence in the vicinity of the 5xe2x80x2-terminus.
For more detailed description of the characteristic aspect, the present invention relates to a method for converting the information of mRNA to cDNA. General methods comprise synthesizing a complementary DNA using reverse transcriptase and RNA as template. Then, primer is needed for the initiation of the reaction with the reverse transcriptase. The term primer means DNA chain or RNA chain supplying nucleotide 3xe2x80x2-OH required by a template-dependent DNA polymerase for the synthesis of a new chain. Current progress of DNA synthesis technology enables ready synthesis of short-chain oligonucleotides of 15 to 40 bases in length and with a primer function.
For the purpose of cDNA synthesis, generally, use is made of oligo dT12 18 primer complementary to a sequence of a series of plural adenines, as called poly-A chain, present on the 3xe2x80x2-terminus of mRNA. Although the synthesis efficiently starts in case that the primer is used, the synthesis rarely progresses up to the 5xe2x80x2-terminus of mRNA with the cap structure because of the instability and long chain of RNA and the secondary structure thereof, as described above. It is readily deduced that the tendency is likely more prominent in case that mRNA is longer. More additionally, the aforementioned grounds work to make full-length cDNA synthesis difficult. It cannot be said that any of the existing technologies attempting full-length cDNA synthesis can overcome the problem.
On the contrary, in accordance with the present invention, the 5xe2x80x2-terminal sequence of mRNA, in particular, can absolutely be analyzed rapidly, which has been considered difficult. One of the characteristic features of the present invention lies in the use as primer of a short-chain oligonucleotide capable of being annealed to an intermediate sequence within mRNA, preferably a sequence in the vicinity of the 5xe2x80x2-terminus. Particularly preferably, a short-chain oligonucleotide comprising a random sequence of 6 bases or more.
Theoretically, herein, the base length of the short-chain oligonucleotide used as the primer is an appropriate length shorter than the sequence of mRNA, in which reverse transcription can start from various sites of mRNA. It is currently reported that the shortest length required for sequence-specific primer activity is a length of 6 bases. Thus, the single-stranded oligonucleotide is of a length of 6 bases or longer in accordance with the invention.
In case that the shortest random hexamer comprising 6 bases is used as primer, principally, single-stranded oligonucleotides of 4096 (=46) nucleotide sequences are nominated as candidates. Among such numerous sequences, accordingly, a single-stranded oligonucleotide of a nucleotide sequence capable of initiating reverse transcription in a desired site of mRNA is satisfactorily selected. When such random hexamer is selected, the possibility of the synthesis of cDNA including the 5xe2x80x2-terminus of mRNA can be raised.
A reverse transcription method using a short-chain oligonucleotide of such appropriate nucleotide sequence is frequently utilized as the search method of clones along 5xe2x80x2-direction, for the cloning of cDNA derived from large mRNA. The method is described in for example J. Virol., Vil. 28, p. 743 (1978).
However, the procedure described in the method in J. Viol. and the procedure of the method according to the present invention are identical in terms of reverse transcription by means of short-chain oligonucleotide but are totally different in that only 5xe2x80x2 cDNA is selectively amplified by the procedure according to the present invention. Because the method described in J. Virol. comprises reverse transcription, and subsequent synthesis of a second strand and integration thereof in a vector, clones where reverse transcription has never progressed up to 5xe2x80x2-terminus are generated; and furthermore, generally, a linker DNA is attached for the insertion into a vector. During the course of the attachment of the linker DNA, the linker DNA is linked to the termini of a double-stranded cDNA. Therefore, the termini are blunt ended by using T4 DNA polymerase. In that course, 10 to 50 nucleotides are removed, so that full-length cDNA cannot be generated, consequently. In other words, no 5xe2x80x2-terminal sequence is recovered. Alternatively, the method according to the present invention is specific in that an oligoribonucleotide is specifically ligated only to full-length mRNA and that immediately after reverse transcription, PCR is carried out using a primer specific to the sequence of the oligoribonucleotide, to thereby selectively amplify only cDNA comprising complete 5xe2x80x2-terminal sequence.
An additional aspect of the present invention relates to the sequence of an oligoribonucleotide to be replaced for the 5xe2x80x2-terminal cap structure removed with TAP process. The oligoribonucleotide is ligated to the 5xe2x80x2-terminus of mRNA and is then synthesized in the form of cDNA through reverse transcription. The oligoribonucleotide works as a attachment site of a primer specific to the oligoribonucleotide, when used. Thus, the sequence serves as a very important marker for the analysis of the complete 5xe2x80x2-terminus. Reverse transcription using the random hexamer, in particular, enables the collection of plural cDNA fragments derived from mRNA; hence, the sequence specificity of the oligonucleotide replaced for the cap in these fragments determines whether or not only the sequence derived from the 5xe2x80x2-terminus of the mRNA can specifically be analyzed. In accordance with the present invention, therefore, the designing of the oligoribonucleotide and the sequencing thereof are very significant.
For more detailed description, it is said that the nucleotide sequence on the genome of humans or mouse comprises about 3xc3x97109 base pairs (bp). Gene-encoding regions, gene expression regulatory regions, reiterative sequences, introns and the like are arranged on the genome and these structures function under the control of extremely sophisticated programs in the course of development. It is also considered that the sequences are never random.
For example, a structure designated xe2x80x9cCpG islandxe2x80x9d is listed. The structure is known to be present in the 5xe2x80x2-region, promoter region and first exon of gene (Tanpakushitsu.Kakusan.Kouso (Protein, Nucleic acid and Enzyme), 41 (15), p. 2288, (1996)). It is known that promoters involved in gene expression or regions involved in transcription termination are enriched with AT. The following reason is very readily understandable but is just deduced; the CpG island serves as a landmark for protein as a trans-factor controlling gene expression to speedily discriminate AT rich region thermodynamically unstable from GC rich region thermodynamically stable, when the protein is going to find its attachment site from the sequences on genome.
Furthermore, it is known based on the analyses so far that the occurrence frequency of dinucleotides is biased in all living organisms. Particularly, a rule of excess CT and TG and deficiency of CG and TA is also known (Proc. Natl. Acad. Sci., Vol. 85, pp. 9630-9634 (1988)). It is thus considered that the genome sequences are never random in living organisms but include information evolved under a certain rule.
Regarding to the oligoribonucleotide for use in accordance with the present invention, therefore, the application of a sequence introduced under consideration of the bias in the sequences on genome to the analysis of the 5xe2x80x2-terminal sequence of mRNA can elevate the precision of the analysis of the 5xe2x80x2-terminus of a specific gene among an assembly of very complicated 5xe2x80x2-cDNA sequences.
In association with the present invention, additionally, it has been deduced that TAP quality is a very significant element. More specifically, it has been known that the cap structure can be cleaved by using TAP (FEBS Lett., Vol. 65, pp. 254-257 (1976)). The method is only one known method capable of principally verifying the cap. The TAP action absolutely certifies that RNA has the cap structure (7 mGppp) at the 5xe2x80x2-terminus of mRNA, namely RNA with intact 5xe2x80x2-terminus.
Meanwhile, it is known that RNA is handled with much difficulty, because RNase as one nuclease species consistently exposes RNA to a degradation risk. Thus, RNA experiments essentially demand the handling of RNA under suppression of the activity at the lowest limit as required. A reference (Blumberg, D. D. Method in Enz., 152: pp. 20-24 (1987), Academic Press.) for example describes in detail experimental precautions relating to the handling. Even under such precautions, it is difficult to thoroughly suppress RNA degradation. Additionally, mRNA differs from genome DNA in that mRNA is not double-stranded but single-stranded. Accordingly, RNA extracted should be handled in aqueous solvents. However, the most thermodynamically stable stem structure is formed as the RNA secondary structure in various regions within the molecule. The stem structure is a serious cause for the inhibition of reverse transcriptase reaction. The aforementioned two points serve as serious causes for incomplete conversion of mRNA sequence to cDNA.
In an additional characteristic aspect of the present invention, tobacco acid pyrophosphatase is used so as to remove the phosphate group from the cap structure at the 5xe2x80x2-terminus of non-full-length mRNA in the mRNA sample at the second step; the tobacco acid pyrophosphatase in particular can remove the cap structure at the 5xe2x80x2-terminus and is already purified at an extent with no contamination of other enzymes cleaving the remaining sites of mRNA.
Specifically, it is confirmed that TAP currently commercially available is not appropriately used for efficient sequencing of the 5xe2x80x2-terminus of mRNA. Because the TAP contains trace amounts of enzymes such as nuclease cleaving phosphodiester bonds composing RNA and phosphatase removing 5xe2x80x2-phosphate group freshly generated after cap cleavage, such TAP can never be used for selective cleavage of the cap structure in mRNA and thereby efficient sequencing of the 5xe2x80x2-terminus of mRNA. In accordance with the present invention, therefore, use is made of TAP being capable of removing the cap structure at the 5xe2x80x2-terminus and purified at an extent with no contamination of other enzymes cleaving the remaining sites of mRNA.
As has been described above in accordance with the present invention, advantageously, a rapid synthesis method of cDNA starting from the 5xe2x80x2-terminus of mRNA can be provided for the purpose of the analysis of the full-length 5xe2x80x2-terminal sequences of numerous cDNA species in a rapid manner by selectively synthesizing cDNA including the 5xe2x80x2-terminal sequence of full-length mRNA with the cap structure. Additionally, tobacco acid pyrophosphatase preferable for use in the synthesis method can be provided.