1. Field of the Invention
This invention relates to a DNA fragment analysis method including comparative DNA analysis as well as gene expression profiling.
2. Description of the Related Art
With the progress of the human genome project, the whole genome structure and the genomic DNA base sequences have been being clarified. The next step is to analyze gene functions coded in genomes. Comparative analysis of genomes or DNA fragments as well as gene expression profiling play an important role in the function analysis.
All the inherited information is written in genome. The information is transcripted by mRNA to produce protein according to the information coded in the genome. The produced proteins have functions in living cells. For understanding the activities of genes in a cell, the analysis of produced proteins (expressed proteins) or mRNA are carried out. Especially the analysis of the species and quantities of mRNAs, existing in a cell is important to know the whole figure of the activities and functions of genes.
The analysis of the species and quantities of mRNAs in a cell or tissue is called as gene expression profiling. mRNA is digested easily by ribonuclease (RNase H enzyme) which is in a cell. Therefore, the analysis of mRNA is often carried out by using its complementary strand called cDNA (complementary DNA) which is produced by reverse transcription of mRNA by using a reverse transcriptase.
There are various DNA analysis methods which are used in comparative study of genomes or DNA fragments as well as in gene expression profiling. The gene expression profiling means the analysis of genes working in a cell. Therefore practically it means the analysis of species and quantities of various mRNAs or cDNAs (complementary DNAs which are obtained as the reverse transcription products of mRNAs).
For the comparative analysis of DNAs, gel electophoretic analysis of fragments or DNA sequencing are often used. However, for a long DNA or a DNA mixture, DNA sequencing is not so easy and a simple DNA fragment size analysis is used instead. For the gene function analysis, the gene expression profiling gives an important information. The amount of mRNA in a cell at various environments is analyzed to investigate a correlation between an environment and a gene.
However, this analysis method was very time consuming and labor intensive and it was difficult to get expression profilings of various genes at a time. Recently, various new technologies and instruments have been developed which enables us to detect the gene expressions of many genes. As the present invention is focused more on gene expression profiling, the following explanation is done for the gene expression profiling.
The methods for the new technologies and instruments include the use of DNA chip and DNA fragment scanning. A review of methods for gene expression profiling is described in Nature Biotechnology 14, 1675-1680 (1996).
DNA chip means a DNA probe array on a solid substrate. It has many cells having different DNA probes, respectively. A DNA probe has a specific sequence and can hybridize with a corresponding complementary DNA sequence which appears in a target DNA. In an analysis of mRNAs (or cDNAs) with a DNA chip, cDNAs are prepared from the target sample containing various mRNAs. They are digested with enzymes and then labeled with fluorophore tags. The labeled fragments are hybridized with DNA probes on a DNA chip.
If there are complementary strands of the fragments in the DNA probes, the fragments are hybridized with DNA probes. Even after a washing process for removing unhybridized fragments from the DNA chip surface, the fluorophore labeled fragments complementary to DNA probes can be held on the DNA chip to be detected with a fluorescent microscope. The positions, therefore the probe species of trapped DNA fragments and the amount of the same are detected with a fluorescence microscope.
This method is applied to the gene expression detecting as reported in Proc. Natl. Acad. Sci. USA, 93, 10614-10619 (1996). The DNA chip method is very useful for detecting known genes. However, it is not: suitable for detecting unknown genes because probes for detecting unknown genes cannot be produced without their sequence information.
Another powerful method for detecting a gene expression is the scanning method. The "scanning" means that the method can detect any fragments by size independently from their sequences. It uses a gel electrophoresis. Autoradiography with radio isotope labeling or fluorescence detection is used. The key point of the method is to produce a short DNA fragment which is a part of each DNA in a sample and can be a signature of it by PCR (Polymerase Chain Reaction).
There are several ways for that. Fluorescent Differential Display method (FDD) (FEBS Letters 351, 231-236 (1994)) uses electropherograms of PCR products obtained with a long fluorescent labeled primer and several short arbitrary primers, which may hybridize with several parts of target DNAs, for comparing cDNAs prepared under different conditions.
Molecular index method as well as AFLP (Amplified Fragment Length Polymorphism) (Nucleic Acids Research, 24, 2616-2617 (1996) and Nucleic Acids Symposium Series, 35, 257-258 (1996)) use electropherograms of PCR products obtained sets of long primers which can hybridize to terminal bases of the target fragment DNAs.
In the latter methods, DNAs in a sample are the target for an analysis and are prepared to have special base sequence at 3' (or 5') terminus (being called site A), which will be one of the priming sites in PCR reaction later. As their sizes are various and not adequate for electrophoresis as they are. DNAs are digested by a restriction endonuclease into fragments and the fragments including the site A of the original long DNAs are used as the signature fragments of the DNAs.
The termini of the digested fragments are ligated with a second oligomer which make the another hybridization site (site B) for PCR primers. The both sites A and B are used as the priming sites for PCR amplification of DNA fragments. The products are analyzed to give information of DNA species and abundance (population) of the DNAs in a sample.
The advantage of the scanning method is that any DNA probes specific for individual DNAs are not necessary and therefore it can be applied for detecting unknown DNAs, the DNA probes for which cannot be prepared. It is very powerful and useful to look for new genes, for which the probe cannot be produced in advance. For scanning all the DNAs included in a sample, the latter method has an advantage. The method is described because the present invention has done to improve its drawbacks.
The target is a sample consisting of various cDNAs which represent gene expression. What we want to do is to clarify the DNA species and their relative abundance or population. The best way is to analyze them directly by gel electrophoresis, however, they may be digested at somewhere and they are too long to carry out a precise size analysis. The precise size analysis of DNA fragments by gel electrophoresis can be carried out as far as the sizes are smaller than about 1 kb (1000 bases) in length. They should be digested into small fragments and one fragment of each DNA should be chosen to represent the DNA for the analysis.
This can be done as follows: All the cDNAs in a sample are prepared as a double stranded DNA, one strand of which has a polyA chain (polyA tail) at the 3' termini. As there is only one polyA tail region in each cDNA, a sequence adjacent to the polyA tail region is used as a signature sequence of the cDNA. The cDNA species and their population can be determined by analyzing the fragments containing the signature sequences.
The double stranded cDNAs are digested with an endonuclease and the products are ligated with an oligomer at the termini. The fragments including the polyA tail (site A) and the ligated oligomer (site B) can be amplified by PCR. The PCR amplifications are carried out with an oligo dT primer and the primer having sequences complementary to the ligated oligomer and having one- base sequence or two-base sequence (being called as selective sequences) to distinguish the fragments. When the total number of cDNA species is too large, it will be difficult to distinguish all the fragments by one electropherogram because too many peaks appear in the electropherogram.
The selective sequences are very effective for grouping DNA fragments according to their terminal sequences. Complementary strand extension reactions occur when the primers hybridize perfectly onto the templates (target DNAs). Especially a complete match at 3' terminus of the primers is necessary for the successful complementary strand extension in PCR.
Therefore, the selective sequences are used to selectively amplify a part of the fragments which reduce the number of peaks appearing in one electropherogram. Many electropherograms obtained with a various combination of two selective primers, instead of one electropherogram, are obtained to give a more precise information on the cDNAs (gene expression profilings). When one of the primers is labeled with a fluorophore tag, the electropherogram can be obtained with a fluorescence detection type DNA sequencer.
As the number of genes (therefore cDNA species) acting in a cell is supposed to be over 10,000 and the number of peaks being distinguished easily in one electropherogram is about 100, the number of groups or classification by the selective primers should be over 100. This can be done with the sets of selective primers having two base sequences at the 3' termini. The variations produced from two sets of these primers are 256 (16.times.16) or 192 (16.times.12). All the fragments should appear in either of 256 or 192 electropherograms.