Within this application several publications are referenced by Arabic numerals. Full citations for these references may be found at the end of the specification immediately preceding the claims. The disclosures of these publications in their entirety are hereby incorporated by reference into this application in order to more fully describe the state of the art to which this invention pertains.
It is often desirable to isolate unknown biomolecules, particularly biomolecules which are related to a biological process or condition of interest. Unfortunately, this is seldom accomplished without great difficulty, due to several reasons often including a lack of suitable probes and any information concerning the structure or identity of the biomolecule of interest.
It is theoretically possible to isolate a biomolecule of interest by obtaining a pool of hybrids which contains at least one member which incorporates or expresses the biomolecule of interest. One may then proceed to use various techniques to eliminate the members of the pool which are not related to the biomolecule of interest. Unfortunately, such a process would require so many elimination steps that the period of time required for its completion would be so great as to render it impractical.
The most comprehensive technique used in genetic analysis is DNA sequencing. In addition to DNA sequencing, there are other less comprehensive techniques used to identify DNA sequence variability which may be used for purposes such as genetic diagnosis and fingerprinting. These methods make use of DNA sequence recognition by enzymes or nucleic acids.
The sequence-specific cleavage of DNA by restriction endonucleases and the separation of the resulting fragments by gel electrophoresis have been used as core technologies in the physical analysis of DNA. The ability of nucleic acids to form hydrogen-bonded hybrids with complementary nucleic acid strands that are either in solution or immobilized on solid phase substrates has been exploited by techniques which utilize nucleic acid sequences as labelled hybridization probes (1-4). The combined use of these physical and chemical methods in the study of restriction fragment length polymorphisms (RFLPs) has permitted the identification of genetic variants in human populations, and is being used as a method for genetic fingerprinting and mapping (5-7).
With the advent of recombinant DNA technology, the ability of nucleic acids to form hydrogen-bonded hybrids with complementary strands in solution or immobilized on solid-phase substrates has been widely exploited as a means for the characterization and analysis of gene structure and function. The isolation of a gene or mRNA as a recombinant molecule allows for the selective amplification and purification of that sequence. It is then possible to label that nucleotide sequence by a variety of techniques in order to specifically probe populations of nucleic acids for the presence of that particular sequence.
Initially, virtually all hybridization probes were produced in bacteria via the amplification of recombinant plasmids in that host. Improvements in solid phase synthesis of DNA have presented alternative ways to generate hybridization probes, as well as many other new ways to manipulate nucleic acids in vitro (8). For example, one may generate oligonucleotide probes using information gleaned from the literature or peptide sequence data.
One of the most widely used applications of the knowledge of the genetic code is the utilization of a complementary technology, protein chemistry, to further understand a system of interest. For any given sequence of amino acids, however, the genetic code does not always indicate an unambiguous nucleotide choice. For this reason, a nucleotide probe corresponding to a given amino acid sequence is typically a mixture of oligonucleotides which may be synthesized simultaneously and which correspond to all possible nucleotide sequences capable of coding for the peptide sequence.
Using current methods for solid phase synthesis of deoxyoligonucleotides, it is possible to synthesize mixed oligonucleotide DNA sequences of similar or identical lengths by coupling nucleotide mixtures at various condensation cycles. The number of different oligonucleotides generated in such a synthesis is increased at each condensation cycle by the multiple of the number of nucleotides used in each cycle. In this way both the number and specific sequences of oligonucleotide species in a given synthesis may be programmed. Oligonucleotide mixtures may also be generated by mixing independently synthesized mixtures of oligonucleotides. Pseudorandom and random mixtures of oligonucleotides may be generated by mixing in an arbitrary fashion independently synthesized oligonucleotides (9).
This synthetic chemistry may be used to design mixtures of oligonucleotides with random and pseudorandom sequences in order to develop a battery of probes whose hybridization patterns to target DNAs may be used for genetic analysis. The reagents and methods of the subject invention are based upon the statistical features of essentially random occurrences of nucleotides in target DNAs and the representation of these occurrences as revealed by DNA hybridization.