The present invention relates to detecting differences in polymers. More specifically, the present invention relates to techniques for identifying, confirming, mapping, and genotyping sample nucleic acid sequences.
Devices and computer systems for forming and using arrays of materials on a chip or substrate are known. For example, PCT applications W092/10588 and 95/11995, both incorporated herein by reference for all purposes, describe techniques for sequencing or sequence checking nucleic acids and other materials. Arrays for performing these operations may be formed according to the methods of, for example, the pioneering techniques disclosed in U.S. Pat. Nos. 5,445,934, 5,384,261 and 5,571,639, each incorporated herein by reference for all purposes.
According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at known locations on a chip. A labeled nucleic acid is then brought into contact with the chip and a scanner generates an image file indicating the locations where the labeled nucleic acids are bound to the chip. Based upon the image file and identities of the probes at specific locations, it becomes possible to extract information such as the nucleotide or monomer sequence of DNA or RNA. Such systems have been used to form, for example, arrays of DNA that may be used to study and detect mutations relevant to genetic diseases, cancers, infectious diseases, HIV, and other genetic characteristics.
The VLSIPS™ technology provides methods of making very large arrays of oligonucleotide probes on very small chips. See U.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 92/10092, each of which is incorporated by reference for all purposes. The oligonucleotide probes on the DNA probe array are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the “target” nucleic acid).
For sequence checking applications, the chip may be tiled for a specific target nucleic acid sequence. As an example, the chip may contain probes that are perfectly complementary to the target sequence and probes that differ from the target sequence by a single base mismatch. For de novo sequencing applications, the chip may include all the possible probes of a specific length. The probes are tiled on a chip in rows and columns of cells, where each cell includes multiple copies of a particular probe. Additionally, “blank” cells may be present on the chip which do not include any probes. As the blank cells contain no probes, labeled targets should not bind specifically to the chip in this area. Thus, a blank cell provides a measure of the background intensity.
While the Human Genome Project is attempting to produce the first complete reference sequence of the human chromosomes, attention is already focusing on the sequence variations among individuals. The genetic diversity is of interest because it may explain the basis of heritable variation in disease susceptibility, as well as harboring a record of human genetic migrations.
The most common type of human genetic variation is the single-nucleotide polymorphism (SNP), which is a position where two alternative bases occur at appreciable frequency (e.g., greater than 1%) in the human population. There are many uses for SNPs including serving as genetic markers for identifying disease genes by linkage studies in families, linkage disequilibrium in isolated populations, association analysis of patients and controls, and loss-of-heterozygosity studies in tumors to name a few. It is believed that large collections of mapped SNPs would provide a powerful tool for human genetic studies. Although individual SNPs can be less informative than conventional genetic markers, SNPs can be more abundant and have a greater potential for automation.
Accordingly, there is a need for innovative techniques for identifying, confirming, mapping, and categorizing polymers, such as nucleic acids.