The field of genomics has taken rapid strides in recent years. It started with efforts to determine the entire nucleotide sequence of simpler organisms such as viruses and bacteria. As a result, genomic sequences of Hemophilus influenzae (Fleischman et al., Science 269: 496-512 [1995]) and a number of other bacterial strains (Escherichia coli, Mycobacterium tuberculosis, Helicobacter pylori, Caulobacter jejuni, Mycobacterium leprae) are now available. This was followed by the determination of complete nucleotide sequence of a number of eukaryotic organisms including budding-yeast (Saccharomyces cerevisiae) (Goffeau et al., Science 274: 563-567 [1996]), nematode (Cenorhabditis elegans) (C. elegans sequencing consortium, Science 282: 2012-2018 [1998]) and fruit fly (Drosophila melanogaster) (Adams et al., Science 287: 2185-2195 [2000]). Genome sequencing is rapidly advancing and several genomes are now complete or partially complete, including the human, mouse, and rice genomes.
The availability of complete genomic sequences of various organisms promises to significantly advance our understanding of various fundamental aspects of biology. It also promises to provide unparalleled applied benefits such as understanding genetic basis of certain diseases, providing new targets for therapeutic intervention, developing a new generation of diagnostic tests etc. However, new and improved tools will be needed to harvest and fully realize the potential of genomics research.
The ability to establish differences between DNA samples from two different sources or from the same source but under different developmental or environmental conditions is very important. Subtle differences in the genetic material can often yield valuable information, which can help understand physiological processes as well as can provide powerful techniques with wide applications. The approach has broad applications in areas such as forensic science, determination of predisposition of individuals to certain diseases, tissue typing, molecular taxonomy etc. DNA fingerprinting is already being used for a variety of purposes. Single nucleotide polymorphism (SNP) screening promises to be yet another powerful tool intended for use in some of these applications.
Just as in the case of DNA profiling, as discussed above, RNA profiling can also yield valuable information with potential uses in similar and overlapping applications. Even though the DNA complement or gene complement is identical in various cells in the body of multi-cellular organisms, there are qualitative and quantitative differences in gene expression in various cells. A human genome is estimated to contain roughly about 40,000 genes, however, only about 15,000-20,000 genes are expressed in a given cell (Liang et al., Science 257: 967-971 [1992]). Moreover, there are quantitative differences among the expressed genes in various cell types. Although all cells express certain housekeeping genes, each distinct cell type additionally expresses a unique set of genes. Phenotypic differences between cell types are largely determined by the complement of proteins that are uniquely expressed. It is the expression of this unique set of genes and the encoded proteins, which constitutes functional identity of a cell type, and distinguishes it from other cell types. Moreover, the complement of genes that are expressed and their level of expression vary considerably depending on the developmental stage of a given cell type. Certain genes are specifically activated or repressed during differentiation of a cell. The level of expression also changes during development and differentiation. Qualitative and quantitative changes in gene expression also take place during cell division, e.g. in various phases of cell cycle. Signal transduction by biologically active molecules such as hormones, growth factors and cytokines often involves modulation of gene expression. The process of aging is characterized by changes in gene expression.
In addition to the endogenous or internal factors as mentioned above, certain external factors or stimuli, such as environmental factors, also bring about changes in gene expression profile. Infectious organisms such as bacteria, viruses, fungi and parasites interact with the cells and influence the qualitative and quantitative aspects of gene expression. Thus, the precise complement of genes expressed by a given cell type is influenced by a number of endogenous and exogenous factors. The outcome of these changes is critical for normal cell survival, growth, development and response to environment. Therefore, it is very important to identify, characterize and measure changes in gene expression. Not only will the knowledge gained from such analysis further our understanding of basic biology, but it will also allow us to exploit it for various purposes such as diagnosis of infectious and non-infectious diseases and screening to identify and develop new drugs etc.
Besides the conventional, one by one gene expression analysis methods like Northern analysis, RNase protection assays, and RT-PCR, there are several methods currently available to examine gene expression on a genome wide scale. These approaches are variously referred to as RNA profiling, differential display, etc. These methods can be broadly divided into three categories: (1) hybridization-based methods such as subtractive hybridization, microarray etc., (2) cDNA tags: EST, serial analysis of gene expression (SAGE) etc., and (3) fragment size based, often referred to as gel-based methods where differential display is generated upon electrophoretic separation of DNA fragments on a gel such as polyacrylamide.
Although libraries made by subtractive hybridization have been used extensively for the identification and cloning of differentially expressed genes (Wecher et al., Nucleic Acids Res. 14: 10027-10044 [1986]; Hedrick et al., Nature 308: 149-153 [1984]; Koyama et al., Proc. Natl. Acad. Sci. USA 84: 1609-1613 [1987]; Zipfel et al., Mol. Cell. Biol. 9: 1041-1048 [1989]), it is very labor intensive, requires large amounts of RNA, and is not amenable to quantitative measurement of gene expression. Moreover, it is not ideally suited for monitoring the expression of a large number of genes in order to generate a genome-wide profile of gene expression. SAGE (see, e.g. U.S. Pat. Nos. 5,695,937 and 5,866,330) provides an alternative method that does not suffer from some of the limitations of subtractive library screening. For example, it allows for quantitative monitoring of global gene expression. However, it too has certain limitations such as higher cost and labor intensiveness, and is not suitable for cloning of identified genes. Moreover, the tag sequences obtained from SAGE library are too short to be used as a gene specific primer or probe.
Gel-based methods (described in U.S. Pat. Nos. 5,871,697, 5,459,037, 5,712,126 and a PCT publication WO 98/51789) address some of the shortcomings of the non-gel-based methods. However, most of them suffer from compromised specificity. Most of the existing gel-based gene expression analysis methods are based on the following principles: cDNAs are first digested by restriction enzyme, ligated with a suitable adapter, then amplified by PCR with selective primers, and fragments resolved on electrophoretic gels. The selection of a cDNA population relies upon the annealing of the selective primers to the cDNA fragments and extension by a polymerase during PCR amplification. The method uses sequence variation of neighboring restriction sites in different cDNA fragments. However, PCR is less than ideal in terms of specificity. Depending on the stringency of annealing conditions, one to a few base mismatches are tolerated and primers are extended by the DNA polymerase inspite of less than perfect complementarity between the primer and the template. The variation among the selective primers does not allow stringent conditions for all PCR. The resultant non-specific priming and amplification distorts the profile of amplified fragments, which often does not correlate well with the mRNA profile of the sample.
The individual methods using a gel-based approach suffer from some additional specific disadvantages. For example, a method developed by Curagen (U.S. Pat. No. 5,871,697) requires the use of many different restriction enzymes, the enzyme selection is not flexible, and the reaction set up is rather complicated. Each cDNA sample in this method is separated into 96 pools, and digested by 96 pairs of different 6-base cutter enzymes. It would be difficult to increase the fractionation in this method. A method developed by Digital Gene Technology (U.S. Pat. No. 5,459,037) is based on capturing the 3′-end fragments of cDNAs such that each gene will have only one representative. However, a major disadvantage of this method is its long and complicated procedure, which is not only labor intensive but, more importantly, also decreases the sensitivity and representation of differential display. The technology involves multiple steps such as cDNA synthesis, library construction and cloning, in vitro RNA transcription, a second round cDNA synthesis, and finally PCR. At each step in this procedure, some bias is introduced that ultimately skews the original representation of transcripts. A PCT publication WO 98/51789 describes a method developed by Display System Technology that utilizes a PCR based profiling approach. The use of only 4 base cutters in this method generates a large number of bands for a specific cDNA species, and introduces redundancy.
Methods for the selection of DNA markers using adaptor molecules and the selective amplification of DNA having a plurality of sites for a specific endonuclease are described in UK Patent Application Nos. GB 2,295,011, and GB 2,295,228.
Because of various shortcomings of the currently available technologies there is a need for improved methods of identification, separation and quantitative measurement of nucleic acid fragments. It is the objective of the present invention to provide such a method.