1. Field of the Invention
This invention concerns novel methods for cloning of EST-specific full length cDNA. In particular, the invention concerns a method or methods for full-length cDNA cloning and/or full-length cDNA library construction. The method involves cDNA synthesis from highly enriched, homogeneously purified mRNAs and includes hybrid selection for purification of specific mRNAs from total RNA by employing antisense oligonucleotide primers of expressed sequence tag (EST) sequences in a single or multiplex approach.
2. Background and Related Disclosures
Recently, identification and treatment of many human diseases became dependent on identification of gene responsible for the particular disease.
The human genome comprises approximately 100,000 genes. Less than 5% of these genes have been sequenced and assigned biological functions. Large-scale single-pass sequencing of randomly picked cDNA clones has generated over 500,000 human expressed sequence tags (ESTs), as described in Genome Res., 6:807 (1996). These ESTs provide a large reservoir of information on human genes and are potentially powerful tools for discovery of disease genes and regional gene mapping to individual chromosomes. However, limitations of ESTs include redundancy and the partial sequence characteristic that limits the ability to define a given gene's function.
In contrast to the ESTs, full-length cDNA clones and their full-length coding sequences overcome these limitations and provide more accurate information for data base comparisons to determine gene structure, and for analyses predicting gene function. Generation of a comprehensive gene map of expressed sequences also requires cDNA libraries representing a high percentage of full-length mRNA molecules. For large-scale cDNA sequencing, development of full-length cDNA libraries is essential. Some attempts to achieve this are described in Nature Genet., 2:173 (1992) and in Genome Res., 7:353 (1997).
A good example of utility of full-length cDNAs is identification of genes responsible for breast cancer. Breast cancer is one of the most common causes of morbidity and mortality in women. Only a small proportion of breast cancer cases are due to a familial predisposition resulting from mutations in specific genes. The majority of cases are sporadic and appear to involve multiple genetic changes. Defining those genetic changes in terms of initiation and progression is, therefore, essential for understanding the molecular mechanisms that underlie development of breast cancer and its eventual treatment.
Through the Cancer Genomic Anatomy Project (CGAP) a number of expressed sequence tags (ESTs) from genes expressed in normal, precancerous, and cancerous tissues have been generated. The ability to generate high quality, full-length cDNA coding sequences for these ESTs is integral to defining the biological function of these expressed genes.
In an attempt to provide more accurate genetic information, numerous methods have been developed for constructing cDNA libraries from different tissue samples.
Unfortunately, current technologies cannot create representational libraries of full-length cDNAs and are therefore limited in their utility. cDNA library screening using currently available methods can be very laborious and time-consuming, and as a result, these technologies are not readily amenable to large scale full-length cDNA cloning and sequencing.
Therefore, a method that streamlines this process would greatly facilitate the generation of full-length cDNAs and full-length coding sequences, leading to a better understanding of the genetic and metabolic mechanisms that underlie breast cancer and other genetic or genetically controlled diseases.
Successful cloning of a specific, full-length cDNA largely depends on the frequency of full-length cDNA molecules within a cDNA library. Of the numerous methods for construction of cDNA libraries, the approach based on methods described in Gene, 25:263 (1983) is the most widely used.
The mRNA complexity of a typical cell is represented in 15,000-20,000 distinct mRNAs. The most prevalent mRNAs are present at approximately 5000 copies per cell and low abundance mRNA are generally at 1-15 copies per cell. Because of heterogeneous expression in the cells that comprise a given organ, mRNA can be even more under-represented in certain tissues, as described, for example, in Brain Res. Rev., 17:263 (1992) for the brain cells.
This heterogeneity makes it difficult to isolate specific, low abundance mRNAs. To some extent, such limitation can be circumvented by isolating mRNAs from clonal cell lines. However, generating cDNA libraries from cell lines is often not possible or will not address the case where the expression level of a given mRNA-derived sequence is undefined.
It is, therefore, important that other approaches are developed which are able to isolate full-length cDNA clones of those cells having undefined expression level of mRNA sequence.
Several different techniques have been developed to facilitate cloning of full-length cDNA and to enrich rare mRNA represented in conventional cDNA libraries. The rapid amplification of cDNA ends (RACE) technique has been used for cloning of missing ends from a known incomplete cDNA sequence, as described in PNAS (USA), 85:8998 (1988). However, this approach requires a substantial effort to generate the full-length cDNA clones by attaching RACE-generated fragments onto a cDNA clone that contains a partial sequence.
Approaches to achieve enrichment of specific sequences include subtractive cDNA libraries (Trends Genet., 9:70 (1993)) and normalized cDNA libraries (Genome Res., 6:791 (1996)). While subtractive and normalized libraries are generally not full-length cDNAs, the frequency of full-length cDNAs has been increased by RecA-mediated triple-strand formation in a subtractive cDNA library (Nucleic Acid Res., 24:3478 (1996).
Though these different approaches have been employed, current technologies have not yet been able to efficiently produce representative libraries of full-length cDNAs.
Therefore, a primary objective of this invention is to provide a novel approach for generation and cloning of full-length cDNA which either allows easier full length cDNA cloning or which circumvents altogether the need for cDNA libraries representing all cellular mRNA species by relying on enrichment of specific mRNAs from any tissue or cell source.
All patents, patent applications and publications described or referred to in the specification are hereby incorporated by reference.