The invention relates to characterizing, or "fingerprinting," nucleic acid sequences. It has application in the identification, cloning and analysis of genes, as well as in the diagnosis of disease.
Human genes were initially identified by isolation of their translate products (proteins), amino acid microsequencing, back translation to nucleotide sequences and hybridization of oligonucleotide probes designed from those nucleotide sequences. Protein isolation and purification were a major limitation of this approach.
Positional cloning, or reverse genetics, emerged as a more powerful technique. Taking advantage of physical proximity of genes and polymorphic sequences in the genome, linkage analysis led to the discovery of genes whose protein products were unknown. The positional markers initially used were restriction fragment length polymorphisms (RFLPs); subsequently, sequence target sites (STSs) were preferred. Due to the roaring success of positional cloning as a searching tool for disease genes, the scientific community has embarked on the ambitious project of creating a physical map (based on STSs) of the entire human genome.
As the number of genes identified and sequenced has increased, scientists have developed a new strategy of gene isolation. By aligning amino acid sequences of related genes and analyzing their homology, new proteins that are related to previously identified sequences in the same or other species have been identified either by screening libraries or by polymerase chain reaction (PCR). Although a technically simple and efficient approach, its usefulness is generally limited to proteins that are homologous to previously identified sequences.
A recent non-specific cloning method disclosed by Pardee, et al, U.S. Pat. No. 5,262,311, isolates messenger ribonucleic acids (mRNAs), reverse transcribes them to produce complementary deoxyribonucleic acids (cDNAs), and amplifies their 3' untranslated region by polymerase chain reaction (PCR) with sets of oligonucleotide primers. A first primer in each set hybridizes with the polyA tail and the two nucleotides immediately upstream to it. The second primer, which hybridizes to a sequence still further upstream, is said to comprise a arbitrary sequence of at least 9 and, preferably, of at least 13 nucleotides. It is understood by the inventors hereof that, in accord with actual practice, Pardee et al's second primer hybridizes with the specificity of a 6 to 7-mer.
Although the Pardee et al technique, commonly known as differential display PCR (DD-PCR), constitutes a significant advance, it is not without theoretical and practical limitations. To begin, assuming that the second primers "act" as 6 or 7-mers, a set of 20 will hybridize with a combined average frequency of one in 500. Presuming a normal distribution, roughly half of the sequences targeted by these primers will be more than 500 nucleotides from the polyA tail and, hence, will not be effectively resolved in a conventional electrophoresis gel. Because some mRNAs lack a polyA tail and others are shorter than 500 nucleotides, there is reduced likelihood of obtaining a PCR product. Still further, sequences immediately adjacent the polyA tail will most likely code for untranslated regions that are not necessarily well conserved and that are usually underrepresented in genomic databases. In order to obtain protein-coding sequence, a significant amount of sequencing and DNA library screening are usually required. These steps are laborious and time-consuming. Furthermore, because DD-PCR targets the polyA tail, it can only be applied to mRNA samples.
In view of the foregoing, an object of the invention is to provide improved processes, apparatus and compositions for characterizing, or "fingerprinting," nucleotide sequences. A further objects is to provide such processes, apparatus and compositions for use inter alia in mapping of genomic DNA, the selective identification and isolation of nucleotide sequences, the understanding of functional relationships between such sequences, and the diagnosis of disease. These and other objects are evident in the discussion that follows.