Arrays of oligonucleotide probes have been used in a variety of methods for analyzing target nucleic acids of interest. One such application involves de novo sequencing of a target nucleic acid. Such can, at least in theory, be achieved by hybridizing a target nucleic acid to a complete array of all probe sequences of a given length and identifying the subset of probes that hybridize to the target. Another application is the detection and quantification of mRNA levels in a mixed population. Other applications involve comparing a known reference sequence with a target sequence that may differ from the reference sequence in the presence of mutations, polymorphisms and other variations.
A simple strategy for identifying variations in a target sequence is the reverse dot blot, as discussed by Dattagupta, EP 235,726, Saiki, WO 89/11548. Other strategies for comparative analysis of target nucleic acids with reference nucleic acids are described in WO 95/11995 (incorporated by reference in its entirety for all purposes). Some such arrays include four probe sets. A first probe set includes overlapping probes spanning a region of interest in a reference sequence. Each probe in the first probe set has an interrogation position that corresponds to a nucleotide in the reference sequence. That is, the interrogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarily between the two. For each probe in the first set, there are three corresponding probes from three additional probe sets. Thus, there are four probes corresponding to each nucleotide in the reference sequence. The probes from the three additional probe sets are identical to the corresponding probe from the first probe set except at the interrogation position, which occurs in the same position in each of the four corresponding probes from the four probe sets, and is occupied by a different nucleotide in the four probe sets.
Such an array is hybridized to a labelled target sequence, which may be the same as the reference sequence, or a variant thereof. The identity of any nucleotide of interest in the target sequence can be determined by comparing the hybridization intensities of the four probes having interrogation positions aligned with that nucleotide. The nucleotide in the target sequence is the complement of the nucleotide occupying the interrogation position of the probe with the highest hybridization intensity.
A further strategy for comparing a target sequence with a reference sequence is described in EP 717,113. In this strategy, an array contains overlapping probes spanning a region of interest in a reference sequence. The array is hybridized to a labelled target sequence, which may be the same as the reference sequence or a variant thereof. If the target sequence is a variant of the reference sequence, probes overlapping the site of variation show reduced hybridization intensity relative to other probes in the array. In arrays in which the probes are arranged in an ordered fashion stepping through the reference sequence (e.g., each successive probe has one fewer 5′ base and one more 3′ base than its predecessor), the loss of hybridization intensity is manifested as a “footprint” of probes approximately centered about the point of variation between the target sequence and reference sequence.
In most of the array strategies described above, each probe present in an array occupies a unique cell or region of the array. In this arrangement, the signal bound by each probe is separately determinable. However, Bains & Smith, J. Theor. Biol. 135, 303-307 (1988) discuss a method of sequencing by hybridization employed an array of oligonucleotides six nucleotides long, in which the two central positions are occupied by pools of each of the four nucleotide bases. In other words, a cell of such an array is occupied by a mixture of sixteen probes of related sequence. The sixteen probes share four positions and differ at two central positions. WO 95/11995 also describes some arrays containing pooled mixtures of probes. These pooled probes have component probes that are complementary to a common segment of a target sequence except at one or a few positions within the probe lengths at which the probes differ. Such probes can be used in several strategies to detect variations in a target sequence relative to a reference sequence. These pooling strategies can have advantages in reducing the number of array cells required to analyze a given target sequence.