This invention relates to methods and means for rapid screening of target nucleic acid molecules for the presence of sequence signatures. In preferred embodiments, hybridization data is processed by a programmable digital computer.
Polynucleotide arrays, such as the GeneChip.RTM. array (Affymetrix, Inc., Santa Clara, Calif., USA), can contain many thousands of differently sequenced polynucleotide probes at feature densities greater than five hundred thousand per 1 cm.sup.2. Such arrays enable one to obtain nucleotide sequence information from target nucleic acid molecules. The information is obtained by performing a hybridization reaction between the target nucleic acid molecule and the polynucleotide probes on the polynucleotide array. The location and identity of the probes to which the target has hybridized, and the extent of hybridization, is determined. Because hybridization between nucleic acids is a function of their sequences, analysis of the sequence of the probes to which the target has hybridized, as well as the extent of hybridization, provides information about the sequence of the target molecule.
Because polynucleotide arrays can have many thousands of probes, hybridization reactions create large amounts of raw data for analysis. Already, several ways of processing such data have been developed. In one application, one examines hybridization between a target molecule and a set of probes that are based upon a reference nucleotide sequence. Probes in the set to which the target does not hybridize or hybridizes weakly indicate sequences in which the target differs from the reference sequence. Nucleic acid arrays have been used to interrogate single nucleotide differences between reference and target nucleic acid sequences. Examples include the identification of genetic variants of infectious diseases, such as HIV, or genetic diseases, such as cystic fibrosis.
Other ways of obtaining useful information from hybridization data would be of benefit to the scientific and medical communities.