A variety of methods have been used to screen for mutations in genes. Usually, such methods begin with amplification of individual exons by DNA PCR or of the transcript by reverse transcription PCR. These methods include direct DNA sequencing, allele-specific probes, allele-specific primers and probe arrays. The design and use of allele-specific probes for analyzing polymorphisms is described by e.g., Saiki et al., Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, WO 89/11548. Allele-specific probes are typically used in pairs. One member of the pair shows perfect complementary to a wildtype allele and the other members to a variant allele. In idealized hybridization conditions to a homozygous target, such a pair shows an essentially binary response. That is, one member of the pair hybridizes and the other does not.
An allele-specific primer hybridizes to a site on target DNA overlapping a polymorphism and primes amplification of an allelic form to which the primer exhibits perfect complementarity. See Gibbs, Nucleic Acid Res. 17, 2427-2448 (1989). This primer is used in conjunction with a second primer which hybridizes at a distal site. Amplification proceeds from the two primers leading to a detectable product signifying the particular allelic form is present. A control is usually performed with a second pair of primers, one of which shows a single base mismatch at the polymorphic site and the other of which exhibits perfect complementarity to a distal site. The single-base mismatch impairs amplification and little, if any, amplification product is generated.
Polymorphisms can also be identified by hybridization to ligonucleotide arrays as described in WO 95/11995 (incorporated by reference in its entirety for all purposes). Some such arrays include four probe sets. A first probe set includes overlapping probes spanning a region of interest in a reference sequence. Each probe in the first probe set has an interrogation position that corresponds to a nucleotide in the reference sequence. That is, the interrogation position is aligned with the corresponding nucleotide in the reference sequence, when the probe and reference sequence are aligned to maximize complementarily between the two. For each probe in the first set, there are three corresponding probes from three additional probe sets. Thus, there are four probes corresponding to each nucleotide in the reference sequence. The probes from the three additional probe sets are identical to the corresponding probe from the first probe set except at the interrogation position, which occurs in the same position in each of the four corresponding probes from the four probe sets, and is occupied by a different nucleotide in the four probe sets.
Such an array is hybridized to a labelled target sequence, which may be the same as the reference sequence, or a variant thereof. The identity of any nucleotide of interest in the target sequence can be determined by comparing the hybridization intensities of the four probes having interrogation positions aligned with that nucleotide. The nucleotide in the target sequence is the complement of the nucleotide occupying the interrogation position of the probe with the highest hybridization intensity.
WO 95/11995 also describes subarrays that are optimized for detection of a variant forms of a precharacterized polymorphism. A subarray contains probes designed to be complementary to a second reference sequence, which can be an allelic variant of the first reference sequence. The second group of probes is designed by the same principles as above except that the probes exhibit complementarity to the second reference sequence. The inclusion of a second group can be particular useful for analyzing short subsequences of the primary reference sequence in which multiple mutations are expected to occur within a short distance commensurate with the length of the probes (i.e., two or more mutations within 9 to 21 bases).
A further strategy for detecting a polymorphism using an array of probes is described in EP 717,113. In this strategy, an array contains overlapping probes spanning a region of interest in a reference sequence. The array is hybridized to a labelled target sequence, which may be the same as the reference sequence or a variant thereof. If the target sequence is a variant of the reference sequence, probes overlapping the site of variation show reduced hybridization intensity relative to other probes in the array. In arrays in which the probes are arranged in an ordered fashion stepping through the reference sequence (e.g., each successive probe has one fewer 5xe2x80x2 base and one more 3xe2x80x2 base than its predecessor), the loss of hybridization intensity is manifested as a xe2x80x9cfootprintxe2x80x9d of probes approximately centered about the point of variation between the target sequence and reference sequence.
In some methods, the set of probes referred to above is a component of an array of immobilized probes, in which other probes may be present. In such methods, the entire array is hybridized to the first labelled control sample and the second labelled target sample. In some methods, the reference allele is at least 100 bases long and the probe set comprises at least 100 overlapping probes spanning the reference allele. In some methods, the target sample is prepared by amplifying nucleic acids from a patient with a pair of primers flanking the locus. In such methods, the target sample can be labelled in the course of amplification. The control sample can be similarly prepared by amplification of the reference allele from a pair of primers flanking the locus, and label can be similarly incorporated in the course of amplification.
In some methods, the array of probes also includes a econd set of probes spanning the locus and complementary to a elected variant allele. In such methods, the intensity of first and second label bound to each probe in the second probe set is determined and a normalized intensity ratio of first label to second label is calculated for each probe in the second set. An inverse normalized intensity ratio of probes in the second set overlapping the locus between 1 and 2 indicates that one copy of the selected variant allele is present in the target sample. An inverse normalized intensity ratio of probes in the second set overlapping the locus of over 2 indicates that two variant alleles are present. One or more additional probe sets can be includes in the array spanning the locus and complementary to one or more other variant alleles.
In some methods, the array of probes comprises first and second probe sets respectively comprising probes complementary to first and second strands of the reference allele, and third and fourth probe sets respectively comprising probes complementary to first and second strands of a selected variant allele. In such methods, the intensity of first and second label bound to each probe in the first and second sets is measured and a normalized intensity ratio of first label to second label for each probe in the first and second sets is caluculated. The mean intensity ratio is normalized to about one when the target sample comprises the homozygous reference allele. The presence of a target sample containing one reference and one variant allele is indicated if the mean normalized intensity ratio of probes in the first probe set overlapping the locus is between 1 and 2 and/or the mean normalized intensity ratio of probes in the second probe set overlapping the locus is between 1 and 2. The presence of a target sample comprising at least two variant alleles is indicated if the mean normalized intensity ratio of probes in the first probe set and/or the mean normalized intensity ratio of probes in the second probe sets overlapping the locus is greater than 2. The presence of a target sample containing one copy of the selected variant allele is indicated if the inverse mean normalized intensity ratio of probes in the third probe set overlapping the locus is between 1 and 2, and/or the inverse mean normalized intensity ratio of probes in the fourth probe set overlapping the locus is between 1 and 2. The presence of a target sample containing two copies of the selected variant allele is indicated if the inverse mean normalized intensity ratio of probes in the third probe set overlapping the locus is over 2, and/or the inverse mean normalized intensity ratio of probes in the fourth probe set overlapping the locus is over 2.