1. Field of the Invention
The field of this invention is the detection of the simultaneous detection of large numbers of variations in DNA sequences.
2. Background
Individual susceptibility to disease and environmental toxins, prognosis of existing disease, efficacy of a particular drug, susceptibility to adverse drug reactions, etc., in humans and domestic animals and plants are becoming increasingly predictable by genetic analysis. Many of these characteristics are associated with multiple genes and are not due to a single genetic abnormality. The extensive effort currently devoted to genome sequencing has revealed a strong correlation between sequence polymorphism, particularly base deletions and substitutions and the occurrence of numerous genetic diseases. It has been estimated that single nucleotide polymorphisms occur at about 1 in every 400 nucleotides, a frequency that is continually updated as the human genome is unraveled. Single nucleotide polymorphisms therefore provide a valuable source of genetic markers for determining identity, establishing genetic linkages and predicting/diagnosing disease.
Analysis of single nucleotide polymorphisms (SNP""s) is becoming a primary approach for the study of human sequence variation. Routine scoring of SNP""s requires the detection of multiple single-nucleotide polymorphisms from one sample, and calls for methods allowing the detection of several single-nucleotide variations in a single analysis.
Polymorphisms in the coding regions of individual genes are often linked to polymorphisms in the neighboring introns. When polymorphisms occur that create or remove restriction sites, the pattern of lengths of DNA that are produced by one or more restriction enzymes is changed giving rise to xe2x80x9crestriction fragment length polymorphismsxe2x80x9d (RFLP). Thus, changes in the pattern of these fragments, measured by electrophoresis, may reflect familial abnormalities in the adjacent gene. Although RFLP""s are very useful for genetic mapping they do not have 100% correlation with coding region polymorphisms and do not readily lend themselves to the evaluation of the large numbers of genes that may affect the particular phenotype. Accordingly, alternative methods that can meet this need are of interest.
One of these methods is screening for large numbers of xe2x80x9csingle nucleotide polymorphismsxe2x80x9d (SNP) either within coding or non-coding regions of the genome. Although the frequency of SNP""s in coding regions is lower, their relevance arguably may be greater, at least when they produce a change in the amino acid sequence of the corresponding protein. However methods for finding and screening for SNP""s are still relatively primitive.
3. Description of the Related Art
Holland (Proc. Natl. Acad. Sci. USA (1991) 88:7276) discloses that the exonuclease activity of the thermostable enzyme Thermophilus aquaticus DNA polymerase in PCR amplification to generate specific detectable signal concomitantly with amplification. The TAQMAN assay is discussed by Lee in Nucleic Acid Research (1993) 21:16 3761). Marino, Electrophoresis (1996) 17:1499 describes low-stringency-sequence specific PCR (LSSP-PCR). A PCR amplified sequence is subjected to single primer amplification under conditions of low stringency to produce a range of different length amplicons. Different patterns are obtained when there are differences in sequence. The patterns are unique to an individual and of possible value for identity testing.
Single strand conformational polymorphism (SSCP) yields similar results. In this method the PCR amplified DNA is denatured and sequence dependent conformations of the single strands are detected by their differing rates of migration during gel electrophoresis. As with LSSP-PCR above, different patterns are obtained that signal differences in sequence. However, neither LSSP-PCR nor SSCP gives specific sequence information and both depend on the questionable assumption that any base that is changed in a sequence will give rise to a conformational change that can be detected.
Pastinen, Clin. Chem. (1996) 42:1391 amplifies the target DNA and immobilizes the amplicons. Multiple primers are then allowed to hybridize to sites 3xe2x80x2 and contiguous to an SNP site of interest. Each primer has a different size that serves as a code. The hybridized primers are extended by one base using a fluorescently labeled dideoxynucleoside triphosphate. The size of each of the fluorescent products that is produced, determined by gel electrophoresis, indicates the sequence and, thus, the location of the SNP. The identity of the base at the SNP site is defined by the triphosphate that is used. A similar approach is taken by Haff, Nucleic Acids Res. (1997) 25:3749 except that the sizing is carried out by mass spectroscopy and thus avoids the need for a label. However both methods have the serious limitation that screening for a large number of sites will require large, very pure primers that can have troublesome secondary structures and be very expensive to synthesize.
Hacia, Nat. Genet. (1996) 14:441 uses a high density array of oligonucleotides. Labeled DNA samples were allowed to bind to 96,600 20-base oligonucleotides and the binding patterns produced from different individuals were compared. The method is attractive in that SNP""s can be directly identified but the cost of the arrays is high. Fan (Oct. 6-8, 1997, IBC, Annapolis Md.) has reported results of a large scale screening of human sequence-tagged sites. The accuracy of single nucleotide polymorphism screening was determined by conventional ABI resequencing. Allele specific oligonucleotide hybridization along with mass spectroscopy has been discussed by Ross in Anal. Chem. (1997) 69:4197.
Brenner and Lerner, PNAS (1992) 89:5381 suggested that compounds prepared by combinatorial synthesis can each be labeled with a characteristic DNA sequence. If a given compound proves of interest, the corresponding DNA label is amplified by PCR and sequenced, thereby identifying the compound.
W. Clark Still, in U.S. Pat. No. 5,565,324 and in Accounts of Chem. Res., (1996) 29:155, uses a releasable mixture of halocarbons on beads to code for a specific compound on the bead that is produced during synthesis of a combinatorial library. Beads bearing a compound of interest are treated to release the coding molecules and the mixture is analyzed by gas chromatography with flame ionization detection.
Methods and compositions are provided for substantially concurrently detecting a plurality of single nucleotide polymorphisms (snp""s) in DNA, where a plurality of snps is of interest. The method employs as reagents, a mixture of particles, template dependent polynucleotide polymerase, and at least one chain terminating labeled nucleoside triphosphate. The particles are characterized by having a primer nucleic acid sequence, where the particles will have a plurality of primer sequences for different sites associated with snp""s and a unique coding composition defining the primer sequence. Depending upon the particular protocol and the number of snps of interest, the coding composition will comprise one or more molecules. In carrying out the method, the reagents and sample are mixed and the primers having snp""s corresponding to the chain terminating nucleoside triphosphate extended by one base. Again, depending upon the number of snps of interest and the protocol, the sample may be analyzed in one or more assay mixtures. The particles are identified by means of the label and the primer determined by means of the coding sequence, where with a single coding molecule, the determination can be made using electrokinetic separation. For large numbers of snps, the coding composition may be determined on an individual particle. By having different combinations of coding labels for each of the primers, very large numbers of snps may be determined in a single operation. Kits can be provided of the reagents for convenience to the user.