The advent of DNA microarray technology makes it possible to build an array of hundreds of thousands of DNA sequences in a very small area, such as the size of a microscopic slide. See, e.g., U.S. Pat. No. 6,375,903 and U.S. Pat. No. 5,143,854, each of which is hereby incorporated by reference in its entirety. The disclosure of U.S. Pat. No. 6,375,903, also incorporated by reference in its entirety, enables the construction of so-called maskless array synthesizer (MAS) instruments in which light is used to direct synthesis of the DNA sequences, the light direction being performed using a digital micromirror device (DMD). Using an MAS instrument, the selection of DNA sequences to be constructed in the microarray is under software control so that individually customized arrays can be built to order. In general, MAS based DNA microarray synthesis technology allows for the parallel synthesis of over 800,000 unique oligonucleotides in a very small area of on a standard microscope slide. The microarrays are generally synthesized by using light to direct which oligonucleotides are synthesized at specific locations on an array, these locations are called features. It is another advantage of the MAS microarray synthesis instrument that the designation of probes is under computer software control. This permits custom designed microarrays to be designed and executed with maximum flexibility as to the design of array probes, since difference probes can be made for each array, even for different arrays intended to assay the same genetic elements.
With the availability of the entire genomes of hundreds of organisms, for which a reference sequence has generally been deposited into a public data base, microarrays are being used to perform sequence analysis on DNA isolated from such organisms. One technique that can be used to identify a genetic variant is to sequence the genomic DNA of an individual and then to compare that sequence to the reference sequence of that organism. It has been found that many differences in DNA sequence are presented as single variations in DNA sequence, often referred to as single nucleotide polymorphisms or SNPs. The sequence comparison between the test genome and the reference genome of a species has been referred to as the brute force mechanism of capillary sequencing to identify the SNPs for a particular individual.
A key step and a more recent approach for identifying genetic variations associated with disease is the resequencing of candidate genes or other genomic regions of interest in patients and controls to identify those SNPs associated with a certain phenotype. (See Sakai et al., (1989) PNAS 86:6230-6234). A resequencing approach that has shown significant results utilizes oligonucleotide microarray technology (Hacia, et al., (1999) Nature Genetics, 21(1 Suppl):42-7.)
In particular, this type of array-based resequencing (ABR) approach depends on the differential hybridization of genomic fragments to short perfect-match (PM) and mismatch (MM) oligonucleotides. Each nucleotide to be queried is located at a central position of an oligonucleotide. For each PM oligonucleotide, probes representing the three possible mismatch nucleotides, one representing each possible SNP at the same central position are also synthesized on the array. The differences in hybridization signal intensities between sequences that bind strongly to the PM oligonucleotide and those that bind poorly to the corresponding MM oligonucleotides make is possible to discern the correct base at a given sequence position. Thus, in theory, any time a SNP is present, the mismatch probe representing this SNP will have a higher intensity signal than the corresponding probe that matches the reference sequence.
However, due to unpredictability in signal strength, varying hybridization efficiency, and various other sources of noise, this method typically results in many base positions whose identities are incorrectly predicted. For example, because all the array probes must be hybridized at the same temperature and hybridization stringency conditions, there can be problems with probes that have melting temperatures (Tm) that diverge significantly from the temperature at which the array is hybridized. For probes with a low Tm, the hybridized targets may be significantly washed off the array surface, producing little or no signal. For probes with a high Tm, the single base mismatch may not be significantly destabilizing to provide adequate discrimination to make robust base calls.
Another problem with single base discrimination is that the position of the mismatch can significantly alter the ability of a probe to provide significant discrimination to make robust base calls. For example, if the Tm of the portion of the probe on either side of the mismatch is very high, this portion of the probe may display robust hybridization independent of the mismatch, and thus not provide sufficient mismatch discrimination for base calling. As such, alternative approaches for increasing the efficiency and accuracy of array-based assays, such as DNA resequencing to identify mutations in the genomes of organisms would be a desirable contribution to the art.