Throughout this application, various publications are referenced by author and date within the text. Full citations for these publications may be found listed alphabetically at the end of the specification immediately preceding the claims. All patents, patent applications and publications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art as known to those skilled therein as of the date of the invention described and claimed herein.
The human genome is estimated to contain 100,000 genes, the expressions of which define the functionality of a cell (1). Current technological advances, including large-scale DNA sequencing, efficient library construction and manipulation and PCR-based gene expression monitoring, have resulted in the identification of more than 87,000 unique expression sequence tags (ESTs) in diverse cell types and under various physiological conditions (1). Approximately 12% of the ESTs have significant homology with previously identified genes and the remainder require further investigation to define their identity and biological relevance (1). However, ESTs, short stretches of expressed genes, can only provide limited information as to the identity and biological role of specific genes. A more thorough analysis of the ESTs requires a determination of the full protein coding sequences for these expressed genes.
Several approaches are routinely used to obtain cDNAs containing protein-encoding sequences from ESTs. These include, library screening (2) and the PCR-based rapid. amplification of cDNA ends (RACE) strategy (3). A less frequently employed scheme, exon trapping is also amendable to cDNA cloning from genomic fragments (4,5).
A number of cDNA libraries from diverse sources are commercially available. This can in specific instances reduce the burden of producing cDNA libraries that are required for screening for cDNAs. However, even with well-constructed cDNA libraries, several rounds of screening and verification are often required to obtain even a single complete cDNA (2). This process is laborious and can require months of intensive effort. What exacerbates the situation is that cDNA library screening occasionally results in incomplete cDNAs lacking full protein coding information. This occurs primarily because of premature termination of reverse transcription and the self-priming procedure during second strand cDNA synthesis (2, 6, 7). Additionally, obtaining cDNA of low abundance mRNA is rarely achievable unless the cDNA library is high titer and minimally amplified (2). In these contexts, the current approach of cDNA library screening to obtain full protein coding sequence is often costly, laborious and inefficient.
The present invention provides a method for isolating a double-stranded cDNA having a nucleotide sequence of a complete open reading frame which comprises: (A) admixing (i) an isolated single-stranded cDNA, (ii) a first primer capable of forming a stem-loop structure, comprising (a) at the 3xe2x80x2 end of the primer, a first random sequence, linked to (b) a second sequence, linked to (c) a third sequence which forms a loop structure, linked to (d) a fourth sequence, at the 5xe2x80x2 end of the first primer, which is complementary to the second sequence, under hybridization conditions sufficient for annealing the first sequence of the first primer to the sequence at the 3xe2x80x2 end of the single-stranded cDNA, and (iii) a polymerase; (B) incubating the mixture from step (A) under suitable conditions for DNA synthesis; and (C) performing a polymerase chain reaction by admixing (i) an aliquot of the mixture from (B), (ii) a second primer which specifically binds to the single-stranded cDNA, (iii) a third primer which comprises (a) a fifth sequence identical to the third sequence of the first primer, linked to (b) a sixth sequence identical to a portion of the second sequence of the first primer, and (iv) a polymerase, under conditions suitable for a polymerase chain reaction so as to produce a double-stranded cDNA reaction product, thereby isolating the cDNA having the sequence of the complete open reading frame.