The present invention relates to detecting differences in polymers. More specifically, the present invention relates to detecting polymorphisms in sample nucleic acid sequences by clustering hybridization affinity information.
Devices and computer systems for forming and using arrays of materials on a chip or substrate are known. For example, PCT applications WO92/10588 and 95/11995, both incorporated herein by reference for all purposes, describe techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. Nos. 5,445,934, 5,384,261 and 5,571,639, each incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a chip. A labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file indicating the locations where the labeled nucleic acids are bound to the chip. Based upon the image file and identities of the probes at specific locations, it becomes possible to extract information such as the nucleotide or monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to genetic diseases, cancers, infectious diseases, HIV, and other genetic characteristics.
The VLSIPS.TM. technology provides methods of making very large arrays of oligonucleotide probes on very small chips. See U.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 92/10092, each of which is incorporated by reference for all purposes. The oligonucleotide probes on the DNA probe array are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the "target" nucleic acid).
For sequence checking applications, the chip may be tiled for a specific target nucleic acid sequence. As an example, the chip may contain probes that are perfectly complementary to the target sequence and probes that differ from the target sequence by a single base mismatch. For de novo sequencing applications, the chip may include all the possible probes of a specific length. The probes are tiled on a chip in rows and columns of cells, where each cell includes multiple copies of a particular probe. Additionally, "blank" cells may be present on the chip which do not include any probes. As the blank cells contain no probes, labeled targets should not bind specifically to the chip in this area. Thus, a blank cell provides a measure of the background intensity.
The interpretation of hybridization data from hybridized chips can encounter several difficulties. Random errors, such as physical defects on the chip, can cause individual probes or spatially related groups of probes exhibit abnormal hybridization (e.g., by abnormal fluorescence). Systematic errors, such as the formation of secondary structures in the probes or the target, can also cause reproducible, but still misleading hybridization data.
For many applications, it is desirable to determine if there are differences between and among sample nucleic acid sequences, such as polymorphisms at a base position. It would be desirable to have systems and methods of detecting these differences in a way that is not overly affected by random and systematic errors.