The analysis of unknown biopolymer targets often involve their specific binding to known biopolymer probes. The most common technique employing immobilized biopolymers is the Southern blot hybridization technique, in which a set of DNA targets is immobilized on a membrane and a solution containing labeled DNA probe molecules is used to bathe the membrane under conditions where complementary molecules will hybridize. In an analogous technique called Northern blot hybridization, RNA targets are immobilized on membranes and hybridized to complementary RNA probes. Reverse blot hybridization employs the opposite approach. Instead of immobilizing DNA targets, a set of DNA probes is immobilized on a solid surface and the unknown labeled DNA target is present in the liquid phase.
Arrays, constructed by attaching a plurality of the same or different biopolymers to discrete isolated areas on the surface of the substrate, are becoming increasingly important tools in the analysis of unknown biopolymers, such as gene expression analysis, DNA sequencing, mutation detection, polymorphism screening, linkage analysis, genotyping, and screening for alternative splice variants in gene transcripts.
Arrays of nucleic acid probes can be used to extract sequence information from, for example, nucleic acid samples. The samples are exposed to the probes under conditions that allow hybridization. The arrays are then scanned to determine to which probes the sample molecules have hybridized. One can obtain sequence information by careful probe selection and using algorithms to compare the patterns of hybridization and non-hybridization. This method is useful for sequencing nucleic acids, as well as sequence checking.
Gene expression analysis is a method of critical importance to basic molecular biological research. Since, in higher organisms, the choice of genes being expressed in any given cell has a profound effect on the nature of the cell, gene expression analysis can provide a key to the diagnosis, prognosis, and treatment of a variety of diseases in animals, including humans and plants. Additionally, gene expression analysis can be used to identify differentially expressed novel genes, to correlate a gene expression to a particular phenotype, to screen for a disease predisposition, and to conduct toxicity testing.
Typically, in the gene expression analysis, an array of probe nucleic acids is formed by attaching a set of individual gene-specific probes to a solid substrate in a regular pattern, so that the location of each probe is known. The array is contacted with a sample containing target nucleic acids under hybridization conditions. The hybrids are detected using a wide variety of methods, most commonly by employing radioactive or fluorescent labels.
Using the current reverse hybridization formats and stringency control methods, however, it remains difficult to detect low copy number (i.e., 1-100,000) nucleic acid targets, even with the most sensitive reporter groups (enzyme, fluorophores, radioisotopes, etc.) and associated detection systems (fluorometers, luminometers, photon counters, scintillation counters, etc.).
This difficulty is caused by several underlying problems associated with direct probe hybridization. One problem relates to the stringency control of hybridization reactions. Hybridization reactions are usually carried out under the stringent conditions in order to achieve hybridization specificity. Methods of stringency control involve primarily the optimization of temperature, ionic strength, and denaturants in hybridization, and subsequent washing procedures. Unfortunately, the application of these stringency conditions causes a significant decrease in the number of hybridized probe/target complexes for detection.
Another problem relates to the high complexity of DNA in most samples, particularly in human genomic DNA samples. When a sample is composed of an enormous number of sequences that are closely related to the specific target sequence, even the most unique probe sequence has a large number of partial hybridizations with non-target sequences.
A distinctive exception to the general difficulty in detecting low copy number target nucleic acid with a direct probe is the in-situ hybridization technique. This technique allows low copy number unique nucleic acid sequences to be detected in individual cells. In the in-situ format, target nucleic acid is naturally confined to the area of a cell (about 20-50 μm2) at a relatively high local concentration. Furthermore, the probe/target hybridization signal is confined to a microscopic and morphologically distinct area. This makes it easier to distinguish a positive signal from artificial or non-specific signals than hybridization on a solid support.
Mimicking the in-situ hybridization in some aspects, new techniques are being developed for carrying out multiple sample nucleic acid hybridization analysis on micro-formatted multiplex or matrix devices, e.g., DNA chips. These chips, which are smaller than a thumbnail, contain hundreds of thousands or more of different molecular probes. These biological chips or arrays have probes arranged in arrays, each probe assigned a specific location, such as micro-wells of a chip. Biological chips have been produced in which each location has a scale of, for example, ten microns. The chips can be used to determine whether target molecules interact with any of the probes on the chip. After exposing the array to target molecules under selected test conditions, scanning devices can examine each location in the array and determine whether a target molecule has interacted with the probe at that location. These hybridization formats are micro-scale versions of the conventional “reverse dot blot” and “sandwich” hybridization systems.
Biological chips, or arrays, are useful in a variety of screening techniques for obtaining information about either the probes or the target molecules. For example, a library of peptides can be used as probes to screen for drugs. The peptides can be exposed to a receptor, and those probes that bind to the receptor can be identified.
The micro-formatted hybridization can also be used to carry out “sequencing by hybridization” (SBH) (see M. Barinaga, Science, 253:1489 (1991); W. Bains, Bio/Technology, 10:757-758 (1992)). SBH makes use of all possible n-nucleotide oligomers (n-mers) to identify n-mers in an unknown DNA sample, which are subsequently aligned by algorithm analysis to produce the DNA sequence.
There are two formats for carrying out SBH. One format involves creating an array of all possible n-mers on a support, which is then hybridized with the target sequence. This is a version of the reverse dot blot. Another format involves attaching the target sequence to a support, which is sequentially probed with all possible n-mers. Both formats, however, have the fundamental problems of direct probe hybridizations and additional difficulties related to multiplex hybridizations. This inability to achieve “sequencing by hybridization” by a direct hybridization method leads to a so-called “format 3,” which incorporates a ligase reaction step. While providing some degree of improvement, it actually represents a different mechanism involving an enzyme reaction step to identify base differences.
Regardless of the format, all current micro-scale DNA hybridizations and SBH approaches do not overcome the underlying problems associated with nucleic acid hybridization reactions. There remains, therefore, a need for improved microarray techniques.