The present invention is directed to accelerating identification of single nucleotide polymorphisms and an alignment of clone in genomic sequencing.
Introduction to Applications of SNPS
Accumulation of genetic changes affecting cell cycle control, cell differentiation, apoptosis, and DNA replication and repair lead to carcinogenesis (Bishop, J. M., xe2x80x9cMolecular Themes In Oncogenesis,xe2x80x9d Cell, 64(2):235-48 (1991)). DNA alterations include large deletions which inactivate tumor supressor genes, amplification to increase expression of oncogenes, and most commonly single nucleotide mutations or polymorphisms which impair gene expression or gene function or predispose an individual to further genomic instability (Table 1).
Rapid detection of germline mutations in individuals at risk and accurate characterization of genetic changes in individual tumors would provide opportunities to improve early detection, prevention, prognosis, and specific treatment. However, genetic detection poses the problem of identifying a predisposing polymorphism in the germline or an index mutation in a pre-malignant lesion or early cancer that may be present at many potential sites in many genes. Furthermore, quantification of allele copy number is necessary to detect gene amplification and deletion. Therefore, technologies are urgently needed that can rapidly detect mutation, allele deletion, and allele amplification in multiple genes. Single nucleotide polymorphisms (xe2x80x9cSNPxe2x80x9ds) are potentially powerful genetic markers for early detection, diagnosis, and staging of human cancers.
Identification of DNA sequence polymorphisms is the cornerstone of modern genome mapping. Initially, maps were created using RFLP markers (Botstein, D., et al., xe2x80x9cConstruction Of A Genetic Linkage Map In Man Using Restriction Fragment Length Polymorphisms,xe2x80x9d Amer. J. Hum. Genet., 32:314-331 (1980)), and later by the more polymorphic dinucleotide repeat sequences (Weber, J. L. et al., xe2x80x9cAbundant Class Of Human DNA Polymorphisms Which Can Be Typed Using The Polymerase Chain Reaction,.xe2x80x9d Amer. J. Hum. Genet., 44:388-396 (1989) and Reed, P. W., et al., xe2x80x9cChromosome-Specific Microsatellite Sets For Fluorescence-Based, Semi-Automated Genome Mapping,xe2x80x9d Nat Genet, 7(3): 390-5 (1994)). Such sequence polymorphisms may also be used to detect inactivation of tumor suppressor genes via LOH and activation of oncogenes via amplification. These genomic changes are currently being analyzed using conventional Southern hybridizations, competitive PCR, real-time PCR, microsatellite marker analysis, and comparative genome Hybridization (CGH) (Ried, T., et al., xe2x80x9cComparative Genomic Hybridization Reveals A Specific Pattern Of Chromosomal Gains And Losses During The Genesis Of Colorectal Tumors,xe2x80x9d Genes, Chromosomes and Cancer, 15(4):234-45 (1996), Kallioniemi, et al., xe2x80x9cERBB2 Amplification In Breast Cancer Analyzed By Fluorescence In Situ Hybridization,xe2x80x9d Proc Natl Acad Sci USA, 89(12):5321-5 (1992), Kallioniemi, et al., xe2x80x9cComparative Genomic Hybridization: A Rapid New Method For Detecting And Mapping DNA Amplification In Tumors,xe2x80x9d Semin Cancer Biol, 4(1):41-6 (1993), Kallioniemi, et al., xe2x80x9cDetection And Mapping Of Amplified DNA Sequences In Breast Cancer By Comparative Genomic Hybridization,xe2x80x9d Proc Natl Acad Sci USA, 91(6):2156-60 (1994), Kallioniemi, et al., xe2x80x9cIdentification Of Gains And Losses Of DNA Sequences In Primary Bladder Cancer By Comparative Genomic Hybridization,xe2x80x9d Genes Chromosom Cancer, 12(3):213-9 (1995), Schwab, M., et al., xe2x80x9cAmplified DNA With Limited Homology To Myc Cellular Oncogene Is Shared By Human Neuroblastoma Cell Lines And A Neuroblastoma Tumour,xe2x80x9d Nature, 305(5931):245-8 (1983), Solomon, E., et al., xe2x80x9cChromosome 5 Allele Loss In Human Colorectal Carcinomas,xe2x80x9d Nature, 328(6131):616-9 (1987), Law, D. J., et al., xe2x80x9cConcerted Nonsyntenic Allelic Loss In Human Colorectal Carcinoma,xe2x80x9d Science, 241(4868):961-5 (1988)., Frye, R. A., et al., xe2x80x9cDetection Of Amplified Oncogenes By Differential Polymerase Chain Reaction,xe2x80x9d Oncogene, 4(9):1153-7 (1989), Neubauer, A., et al., xe2x80x9cAnalysis Of Gene Amplification In Archival Tissue By Differential Polymerase Chain Reaction,xe2x80x9d Oncogene, 7(5):1019-25 (1992), Chiang, P. W., et al., xe2x80x9cUse Of A Fluorescent-PCR Reaction To Detect Genomic Sequence Copy Number And Transcriptional Abundance,xe2x80x9d Genome Research, 6(10):1013-26 (1996), Heid, C. A., et al., xe2x80x9cReal Time Quantitative PCR,xe2x80x9d Genome Research, 6(10):986-94 (1996), Lee, H. H., et al., xe2x80x9cRapid Detection Of Trisomy 21 By Homologous Gene Quantitative PCR (HGQ-PCR),xe2x80x9d Human Genetics, 99(3):364-7 (1997), Boland, C. R., et al., xe2x80x9cMicroallelotyping Defines The Sequence And Tempo Of Allelic Losses At Tumour Suppressor Gene Loci During Colorectal Cancer Progression,xe2x80x9d Nature Medicine, 1(9):902-9 (1995), Cawkwell, L., et al., xe2x80x9cFrequency Of Allele Loss Of DCC, p53, RB1, WT1, NF1, NM23 And APC/MCC In Colorectal Cancer Assayed By Fluorescent Multiplex Polymerase Chain Reaction,xe2x80x9d Br J Cancer, 70(5):813-8 (1994), and Hampton, G. M., et al., xe2x80x9cSimultaneous Assessment Of Loss Of Heterozygosity At Multiple Microsatellite Loci Using Semi-Automated Fluorescence-Based Detection: Subregional Mapping Of Chromosome 4 In Cervical Carcinoma,xe2x80x9d Proceedings of the National Academy of Sciences of the United States of America, 93(13):6704-9 (1996)). Competitive and real-time PCR are considerably faster and require less material than Southern hybridization, although neither technique is amenable to multiplexing. Current multiplex microsatellite marker approaches require careful attention to primer concentrations and amplification conditions. While PCR products may be pooled in sets, this requires an initial run on agarose gels to approximate the amount of DNA in each band (Reed, P. W., et al., xe2x80x9cChromosome-Specific Microsatellite Sets For Fluorescence-Based, Semi-Automated Genome Mapping,xe2x80x9d Nat Genet, 7(3): 390-5 (1994), and Hampton, G. M., et al., xe2x80x9cSimultaneous Assessment Of Loss Of Heterozygosity At Multiple Microsatellite Loci Using Semi-Automated Fluorescence-Based Detection: Subregional Mapping Of Chromosome 4 In Cervical Carcinoma,xe2x80x9d Proc. Nat""l. Acad. Sci. USA, 93(13):6704-9 (1996)). CGH provides a global assessment of LOH and amplification, but with a resolution range of about 20 Mb. To improve gene mapping and discovery, new techniques are urgently needed to allow for simultaneous detection of multiple genetic alterations.
Amplified fragment length polymorphism (xe2x80x9cAFLPxe2x80x9d) technology is a powerful DNA fingerprinting technique originally developed to identify plant polymorphisms in genomic DNA. It is based on the selective amplification of restriction fragments from a total digest of genomic DNA.
The original technique involved three steps: (1) restriction of the genomic DNA, i.e. with EcoRI and MseI, and ligation of oligonucleotide adapters, (2) selective amplification of a subset of all the fragments in the total digest using primers which reached in by from 1 to 3 bases, and (3) gel-based analysis of the amplified fragments. Janssen, et al., xe2x80x9cEvaluation of the DNA Fingerprinting Method AFLP as an New Tool in Bacterial Taxonomy,xe2x80x9d Microbiology, 142(Pt 7):1881-93 (1996); Thomas, et al., xe2x80x9cIdentification of Amplified Restriction Fragment Polymorphism (AFLP) Markers Tightly Linked to the Tomato Cf-9 Gene for Resistance to Cladosporium fulvum,xe2x80x9d. Plant J, 8(5):785-94 (1995); Vos, et al., xe2x80x9cAFLP: A New Technique for DNA Fingerprinting,xe2x80x9d Nucleic Acids Res, 23(21):4407-14 (1995); Bachem, et al., xe2x80x9cVisualization of Differential Gene Expression Using a Novel Method of RNA Fingerprinting Based on AFLP: Analysis of Gene Expression During Potato Tuber Development,xe2x80x9d Plant J, 9(5):745-53 (1996); and Meksem, et al., xe2x80x9cA High-Resolution Map of the Vicinity of the R1 Locus on Chromosome V of Potato Based on RFLP and AFLP Markers,xe2x80x9d Mol Gen Genet, 249(1);74-81 (1995), which are hereby incorporated by reference.
AFLP differs substantially from the present invention because it: (i) uses palindromic enzymes, (ii) amplifies both desired EcoRI-MseI as well as unwanted MseI-MseI fragments, and (iii) does not identify both alleles when a SNP destroys a pre-existing restriction site. Further, AFLP does not identify SNPs which are outside restriction sites. AFLP does not, and was not designed to create a map of a genome.
Representational Difference Analysis (RDA) was developed by N. Lisitsyn and M. Wigler to isolate the differences between two genomes (Lisitsyn, et al., xe2x80x9cCloning the Differences Between Two Complex Genomes,xe2x80x9d Science, 259:946-951 (1993), Lisitsyn, et al., xe2x80x9cDirect Isolation of Polymorphic Markers Linked to a Trait by Genetically Directed Representational Difference Analysis,xe2x80x9d Nat Genet, 6(1):57-63 (1994); Lisitsyn, et al., xe2x80x9cComparative Genomic Analysis of Tumors: Detection of DNA Losses and Amplification,xe2x80x9d Proc Natl Acad Sci USA, 92(1):151-5 (1995); Thiagalingam, et al., xe2x80x9cEvaluation of the FHIT Gene in Colorectal Cancers,xe2x80x9d Cancer Res, 56(13):2936-9 (1996), Li, et al., xe2x80x9cPTEN, a Putative Protein Tyrosine Phosphatase Gene Mutated in Human Brain, Breast, and Prostate Cancer,xe2x80x9d Science, 275(5308):1943-7 (1997); and Schutte, et al., xe2x80x9cIdentification by Representational Difference Analysis of a Homozygous Deletion in Pancreatic Carcinoma That Lies Within the BRCA2 Region,xe2x80x9d Proc Natl Acad Sci USA, 92(13):5950-4 (1995). The system was developed in which subtractive and kinetic enrichment was used to purify restriction endonuclease fragments present in one DNA sample, but not in another. The representational part is required to reduce the complexity of the DNA and generates xe2x80x9campliconsxe2x80x9d. This allows isolation of probes that detect viral sequences in human DNA, polymorphisms, loss of heterozygosities, gene amplifications, and genome rearrangements.
The principle is to subtract xe2x80x9ctesterxe2x80x9d amplicons from an excess of xe2x80x9cdriverxe2x80x9d amplicons. When the tester DNA is tumor DNA and the driver is normal DNA, one isolates gene amplifications. When the tester DNA is normal DNA and the driver is tumor DNA, one isolates genes which lose function (i.e. tumor suppressor genes).
A brief outline of the procedure is provided herein: (i) cleave both tester and driver DNA with the same restriction endonuclease, (ii) ligate unphosphorylated adapters to tester DNA, (iii) mix a 10-fold excess of driver to tester DNA, melt and hybridize, (iv) fill in ends, (v) add primer and PCR amplify, (vi) digest ssDNA with mung bean nuclease, (vii) PCR amplify, (viii) repeat steps (i) to (vii) for 2-3 rounds, (ix) clone fragments and sequence.
RDA differs substantially from the present invention because it: (i) is a very complex procedure, (ii) is used to identify only a few differences between a tester and driver sample, and (iii) does not identify both alleles when a SNP destroys a pre-existing restriction site. Further, RDA does not identify SNPs which are outside restriction sites. RDA does not, and was not designed to create a map of a genome.
The advent of DNA arrays has resulted in a paradigm shift in detecting vast numbers of sequence variation and gene expression levels on a genomic scale (Pease, A. C., et al., xe2x80x9cLight-Generated Oligonucleotide Arrays For Rapid DNA Sequence Analysis,xe2x80x9d Proc Natl Acad Sci USA, 91(1l):5022-6 (1994), Lipshutz, R. J., et al., xe2x80x9cUsing Oligonucleotide Probe Arrays To Access Genetic Diversity,xe2x80x9d Biotechniques, 19(3):442-7 (1995), Eggers, M., et al., xe2x80x9cA Microchip For Quantitative Detection Of Molecules Utilizing Luminescent And Radioisotope Reporter Groups,xe2x80x9d Biotechniques, 17(3):516-25 (1994), Guo, Z., et al., xe2x80x9cDirect Fluorescence Analysis Of Genetic Polymorphisms By Hybridization With Oligonucleotide Arrays On Glass Supports,xe2x80x9d Nucleic Acids Res, 22(24):5456-65 (1994), Beattie, K. L., et al., xe2x80x9cAdvances In Genosensor Research,xe2x80x9d Clinical Chemistry, 41(5):700-6 (1995), Hacia, J. G., et al., xe2x80x9cDetection Of Heterozygous Mutations In BRCA1 Using High Density Oligonucleotide Arrays And Two-Colour Fluorescence Analysis,xe2x80x9d Nature Genetics, 14(4):441-7 (1996), Chee, M., et al., xe2x80x9cAccessing Genetic Information With High-Density DNA Arrays,xe2x80x9d Science, 274(5287):610-4 (1996), Cronin, M. T., et al., xe2x80x9cCystic Fibrosis Mutation Detection By Hybridization To Light-Generated DNA Probe Arrays,xe2x80x9d Hum Mutat, 7(3):244-55 (1996), Drobyshev, A., et al., xe2x80x9cSequence Analysis By Hybridization With Oligonucleotide Microchip: Identification Of Beta-Thalassemia Mutations,xe2x80x9d Gene, 188(1):45-52 (1997), Kozal, M. J., et al., xe2x80x9cExtensive Polymorphisms Observed In HIV-1 Clade B Protease Gene Using High-Density Oligonucleotide Arrays,xe2x80x9d Nature Medicine, 2(7):753-9 (1996), Yershov, G., et al., xe2x80x9cDNA Analysis And Diagnostics On Oligonucleotide Microchips,xe2x80x9d Proc Natl Acad Sci USA, 93(10):4913-8 (1996), DeRisi, J., et al., xe2x80x9cUse Of A CDNA Microarray To Analyse Gene Expression Patterns In Human Cancer,xe2x80x9d Nature Genetics, 14(4):457-60 (1996), Schena, M., et al., xe2x80x9cParallel Human Genome Analysis: Microarray-Based Expression Monitoring Of 1000 Genes,xe2x80x9d Proc. Nat""l. Acad. Sci. USA, 93(20):10614-9 (1996), Shalon, D., et al., xe2x80x9cA DNA Microarray System For Analyzing Complex DNA Samples Using Two-Color Fluorescent Probe Hybridization,xe2x80x9d Genome Research, 6(7):639-45 (1996)). Determining deletions, amplifications, and mutations at the DNA level will complement the information obtained from expression profiling of tumors (DeRisi, J., et al., xe2x80x9cUse Of A cDNA Microarray To Analyse Gene Expression Patterns In Human Cancer.xe2x80x9d Nature Genetics, 14(4):457-60 (1996), and Zhang, L., et al., xe2x80x9cGene Expression Profiles In Normal And Cancer Cells,xe2x80x9d Science, 276:1268-1272 (1997)). DNA chips designed to distinguish single nucleotide differences are generally based on the principle of xe2x80x9csequencing by hybridizationxe2x80x9d (Lipshutz, R. J., et al., xe2x80x9cUsing Oligonucleotide Probe Arrays To Access Genetic Diversity,xe2x80x9d Biotechniques, 19(3):442-7 (1995), Eggers, M., et al., xe2x80x9cA Microchip For Quantitative Detection Of Molecules Utilizing Luminescent And Radioisotope Reporter Groups,xe2x80x9d Biotechniques, 17(3):516-25 (1994), Guo, Z., et al., xe2x80x9cDirect Fluorescence Analysis Of Genetic Polymorphisms By Hybridization With Oligonucleotide Arrays On Glass Supports,xe2x80x9d Nucleic Acids Res, 22(24):5456-65 (1994), Beattie, K. L., et al., xe2x80x9cAdvances In Genosensor Research,xe2x80x9d Clinical Chemistry, 41(5):700-6 (1995), Hacia, J. G., et al., xe2x80x9cDetection Of Heterozygous Mutations In BRCA1 Using High Density Oligonucleotide Arrays And Two-Colour Fluorescence Analysis,xe2x80x9d Nature Genetics, 14(4):441-7 (1996), Chee, M., et al., xe2x80x9cAccessing Genetic Information With High-Density DNA Arrays,xe2x80x9d Science, 274(5287):610-4 (1996), Cronin, M. T., et al., xe2x80x9cCystic Fibrosis Mutation Detection By Hybridization To Light-Generated DNA Probe Arrays,xe2x80x9d Hum Mutat, 7(3):244-55 (1996), Drobyshev, A., et al., xe2x80x9cSequence Analysis By Hybridization With Oligonucleotide Microchip: Identification Of Beta-Thalassemia Mutations,xe2x80x9d Gene, 188(1):45-52 (1997), Kozal, M. J., et al., xe2x80x9cExtensive Polymorphisms Observed In HIV-1 Clade B Protease Gene Using High-Density Oligonucleotide Arrays,xe2x80x9d Nature Medicine, 2(7):753-9 (1996), and Yershov, G., et al., xe2x80x9cDNA Analysis And Diagnostics On Oligonucleotide Microchips,xe2x80x9d Proc Natl Acad Sci USA, 93(10):4913-8 (1996)), or polymerase extension of arrayed primers (Nikiforov, T. T., et al., xe2x80x9cGenetic Bit Analysis: A Solid Phase Method For Typing Single Nucleotide Polymorphisms,xe2x80x9d Nucleic Acids Research, 22(20):4167-75 (1994), Shumaker, J. M., et al., xe2x80x9cMutation Detection By Solid Phase Primer Extension,xe2x80x9d Human Mutation, 7(4):346-54 (1996), Pastinen, T., et al., xe2x80x9cMinisequencing: A Specific Tool For DNA Analysis And Diagnostics On Oligonucleotide Arrays,xe2x80x9d Genome Research, 7(6):606-14 (1997), and Lockley, A. K., et al., xe2x80x9cColorimetric Detection Of Immobilised PCR Products Generated On A Solid Support,xe2x80x9d Nucleic Acids Research, 25(6):1313-4 (1997) (See Table 2)). While DNA chips can confirm a known sequence, similar hybridization profiles create ambiguities in distinguishing heterozygous from homozygous alleles (Eggers, M., et al., xe2x80x9cA Microchip For Quantitative Detection Of Molecules Utilizing Luminescent And Radioisotope Reporter Groups,xe2x80x9d Biotechniques, 17(3):516-25 (1994), Beattie, K. L., et al., xe2x80x9cAdvances In Genosensor Research,xe2x80x9d Clinical Chemistry, 41(5):700-6 (1995), Chee, M., et al., xe2x80x9cAccessing Genetic Information With High-Density DNA Arrays,xe2x80x9d Science, 274(5287):610-4 (1996), Kozal, M. J., et al., xe2x80x9cExtensive Polymorphisms Observed In HIV-1 Clade B Protease Gene Using High-Density Oligonucleotide Arrays,xe2x80x9d Nature Medicine, 2(7):753-9 (1996), and Southern, E. M., xe2x80x9cDNA Chips: Analysing Sequence By Hybridization To Oligonucleotides On A Large Scale,xe2x80x9d Trends in Genetics, 12(3):1 10-5 (1996)). Attempts to overcome this problem include using two-color fluorescence analysis (Hacia, J. G., et al., xe2x80x9cDetection Of Heterozygous Mutations In BRCA1 Using High Density Oligonucleotide Arrays And Two-Colour Fluorescence Analysis,xe2x80x9d Nature Genetics, 14(4):441-7 (1996)), 40 overlapping addresses for each known polymorphism (Cronin, M. T., et al., xe2x80x9cCystic Fibrosis Mutation Detection By Hybridization To Light-Generated DNA Probe Arrays,xe2x80x9d Hum Mutat, 7(3):244-55 (1996)), nucleotide analogues in the array sequence (Guo, Z., et al., xe2x80x9cEnhanced Discrimination Of Single Nucleotide Polymorphisms By Artificial Mismatch Hybridization,xe2x80x9d Nature Biotech., 15:331-335 (1997)), or adjacent co-hybridized oligonucleotides (Drobyshev, A., et al., xe2x80x9cSequence Analysis By Hybridization With Oligonucleotide Microchip: Identification Of Beta-Thalassemia Mutations,xe2x80x9d Gene, 188(l):45-52 (1997) and Yershov, G., et al., xe2x80x9cDNA Analysis And Diagnostics On Oligonucleotide Microchips,xe2x80x9d Proc Natl Acad Sci USA, 93(10):4913-8 (1996)). In a side-by-side comparison, nucleotide discrimination using the hybridization chips fared an order of magnitude worse than using primer extension (Pastinen, T., et al., xe2x80x9cMinisequencing: A Specific Tool For DNA Analysis And Diagnostics On Oligonucleotide Arrays,xe2x80x9d Genome Research, 7(6):606-14 (1997)). Nevertheless, solid phase primer extension also generates false positive signals from mononucleotide repeat sequences, template-dependent errors, and template-independent errors (Nikiforov, T. T., et al., xe2x80x9cGenetic Bit Analysis: A Solid Phase Method For Typing Single Nucleotide Polymorphisms,xe2x80x9d Nucl. Acids Res., 22(20):4167-75 (1994) and Shumaker, J. M., et al., xe2x80x9cMutation Detection By Solid Phase Primer Extension,xe2x80x9d Human Mutation, 7(4):346-54 (1996)).
Over the past few years, an alternate strategy in DNA array design has been pursued. Combined with solution-based polymerase chain reaction/ligase detection assay (PCR/LDR) this array allows for accurate quantification of each SNP allele (See Table 2).
For high throughput detection of specific multiplexed LDR products, unique addressable array-specific sequences on the LDR probes guide each LDR product to a designated address on a DNA array, analogous to molecular tags developed for bacterial and yeast genetics genetics (Hensel, M., et al., xe2x80x9cSimultaneous Identification Of Bacterial Virulence Genes By Negative Selection,xe2x80x9d Science, 269(5222):400-3 (1995) and Shoemaker, D. et al., xe2x80x9cQuantitative Phenotypic Analysis Of Yeast Deletion Mutants Using A Highly Parallel Molecular Bar-Coding Strategy,xe2x80x9d Nat Genet, 14(4):450-6 (1996)). The specificity of this reaction is determined by a thermostable ligase which allows detection of (i) dozens to hundreds of polymorphisms in a single-tube multiplex format, (ii) small insertions and deletions in repeat sequences, and (iii) low level polymorphisms in a background of normal DNA. By uncoupling polymorphism identification from hybridization, each step may be optimized independently, thus allowing for quantitative assessment of allele imbalance even in the presence of stromal cell contamination. This approach has the potential to rapidly identify multiple gene deletions and amplifications associated with tumor progression, as well as lead to the discovery of new oncogenes and tumor suppressor genes. Further, the ability to score hundreds to thousands of SNPs has utility in linkage studies (Nickerson, D. A., et al., xe2x80x9cIdentification Of Clusters Of Biallelic Polymorphic Sequence-Tagged Sites (pSTSs) That Generate Highly Informative And Automatable Markers For Genetic Linkage Mapping,xe2x80x9d Genomics, 12(2):377-87 (1992), Lin, Z., et al., xe2x80x9cMultiplex Genotype Determination At A Large Number Of Gene Loci,xe2x80x9d Proc Natl Acad Sci USA, 93(6):2582-7 (1996), Fanning, G. C., et al., xe2x80x9cPolymerase Chain Reaction Haplotyping Using 3"" Mismatches In The Forward And Reverse Primers: Application To The Biallelic Polymorphisms Of Tumor Necrosis Factor And Lymphotoxin Alpha,xe2x80x9d Tissue Antigens, 50(1):23-31 (1997), and Kruglyak, L., xe2x80x9cThe Use of a Genetic Map of Biallelic Markers in Linkage Studies,xe2x80x9d Nature Genetics, 17:21-24 (1997)), human identification (Delahunty, C., et al., xe2x80x9cTesting The Feasibility Of DNA Typing For Human Identification By PCR And An Oligonucleotide Ligation Assay,xe2x80x9d Am. J. Hum. Gen., 58(6):1239-46 (1996) and Belgrader, P., et al., xe2x80x9cA Multiplex PCR-Ligase Detection Reaction Assay For Human Identity Testing,xe2x80x9d Gen. Sci. and Tech., 1:77-87 (1996)), and mapping complex human diseases using association studies where SNPs are identical by decent (Collins, F. S., xe2x80x9cPositional Cloning Moves From Perditional To Traditional,xe2x80x9d Nat Genet, 9(4):347-50 (1995), Lander, E. S., xe2x80x9cThe New Genomics: Global Views Of Biology,xe2x80x9d Science, 274(5287):536-9 (1996), Risch, N. et al., xe2x80x9cThe Future Of Genetic Studies Of Complex Human Diseases,xe2x80x9d Science, 273(5281):1516-7 (1996), Cheung, V. G. et al., xe2x80x9cGenomic Mismatch Scanning Identifies Human Genomic DNA Shared Identical By Descent,xe2x80x9d Genomics, 47(1);1-6 (1998), Heung, V. G., et al., xe2x80x9cLinkage-Disequilibrium Mapping Without Genotyping,xe2x80x9d Nat Genet, 18(3):225-230 (1998), and McAllister, L., et al., xe2x80x9cEnrichment For Loci Identical-By-Descent Between Pairs Of Mouse Or Human Genomes By Genomic Mismatch Scanning,xe2x80x9d Genomics, 47(1):7-11 (1998)).
For 85% of epithelial cancers, loss of heterozygosity and gene amplification are the most frequently observed changes which inactivate the tumor suppressor genes and activate the oncogenes. Southern hybridizations, competitive PCR, real time PCR, microsatellite marker analysis, and comparative genome hybridization (CGH) have all been used to quantify changes in chromosome copy number (Ried, T., et al., xe2x80x9cComparative Genomic Hybridization Reveals A Specific Pattern Of Chromosomal Gains And Losses During The Genesis Of Colorectal Tumors,xe2x80x9d Genes, Chromosomes and Cancer, 15(4):234-45 (1996), Kallioniemi, et al., xe2x80x9cERBB2 Amplification In Breast Cancer Analyzed By Fluorescence In Situ Hybridization.xe2x80x9d Proc Natl Acad Sci USA, 89(12):5321-5 (1992), Kallioniemi, et al., xe2x80x9cComparative Genomic Hybridization: A Rapid New Method For Detecting And Mapping DNA Amplification In Tumors,xe2x80x9d Semin Cancer Biol, 4(1):41-6 (1993), Kallioniemi, et al., xe2x80x9cDetection And Mapping Of Amplified DNA Sequences In Breast Cancer By Comparative Genomic Hybridization,xe2x80x9d Proc Natl Acad Sci USA, 91(6):2156-60 (1994), Kallioniemi, et al., xe2x80x9cIdentification Of Gains And Losses Of DNA Sequences In Primary Bladder Cancer By Comparative Genomic Hybridization,xe2x80x9d Genes Chromosom Cancer, 12(3):213-9 (1995), Schwab, M., et al., xe2x80x9cAmplified DNA With Limited Homology To Myc Cellular Oncogene Is Shared By Human Neuroblastoma Cell Lines And A Neuroblastoma Tumour,xe2x80x9d Nature, 305(5931):245-8 (1983), Solomon, E., et al., xe2x80x9cChromosome 5 Allele Loss In Human Colorectal Carcinomas,xe2x80x9d Nature, 328(6131):616-9 (1987), Law, D. J., et al., xe2x80x9cConcerted Nonsyntenic Allelic Loss In Human Colorectal Carcinoma,xe2x80x9d Science, 241(4868):961-5 (1988), Frye, R. A., et al., xe2x80x9cDetection Of Amplified Oncogenes By Differential Polymerase Chain Reaction,xe2x80x9d Oncogene, 4(9):1153-7 (1989), Neubauer, A., et al., xe2x80x9cAnalysis Of Gene Amplification In Archival Tissue By Differential Polymerase Chain Reaction,xe2x80x9d Oncogene, 7(5):1019-25 (1992), Chiang, P. W., et al., xe2x80x9cUse Of A Fluorescent-PCR Reaction To Detect Genomic Sequence Copy Number And Transcriptional Abundance,xe2x80x9d Genome Research, 6(10):1013-26 (1996), Heid, C. A., et al., xe2x80x9cReal Time Quantitative PCR,xe2x80x9d Genome Research, 6(10);986-94 (1996), Lee, H. H., et al., xe2x80x9cRapid Detection Of Trisomy 21 By Homologous Gene Quantitative PCR (HGQ-PCR),xe2x80x9d Human Genetics, 99(3):364-7 (1997), Boland, C. R., et al., xe2x80x9cMicroallelotyping Defines The Sequence And Tempo Of Allelic Losses At Tumour Suppressor Gene Loci During Colorectal Cancer Progression,xe2x80x9d Nature Medicine, 1(9):902-9 (1995), Cawkwell, L., et al., xe2x80x9cFrequency Of Allele Loss Of DCC, p53, RBI, WT1, NF1, NM23 And APC/MCC In Colorectal Cancer Assayed By Fluorescent Multiplex Polymerase Chain Reaction,xe2x80x9d Br J Cancer, 70(5):813-8 (1994), and Hampton, G. M., et al., xe2x80x9cSimultaneous Assessment Of Loss Of Heterozygosity At Multiple Microsatellite Loci Using Semi-Automated Fluorescence-Based Detection: Subregional Mapping Of Chromosome 4 In Cervical Carcinoma,xe2x80x9d Proc. Nat""l. Acad. Sci. USA, 93(13):6704-9 (1996)). Recently, a microarray of consecutive BACs from the long arm of chromosome 20 has been used to accurately quantify 5 regions of amplification and one region of LOH associated with development of breast cancer. This area was previously thought to contain only 3 regions of amplification (Tanner, M. et al., xe2x80x9cIndependent Amplification And Frequent Co-Amplification Of Three Nonsyntenic Regions On The Long Arm Of Chromosome 20 In Human Breast Cancer,xe2x80x9d Cancer Research, 56(15):3441-5 (1996)). Although this approach will yield valuable information from cell lines, it is not clear it will prove quantitative when starting with microdissected tissue which require PCR amplification. Competitive and real time PCR approaches require careful optimization to detect 2-fold differences (Frye, R. A., et al., xe2x80x9cDetection Of Amplified Oncogenes By Differential Polymerase Chain Reaction,xe2x80x9d Oncogene, 4(9):1153-7 (1989), Neubauer, A., et al., xe2x80x9cAnalysis Of Gene Amplification In Archival Tissue By Differential Polymerase Chain Reaction,xe2x80x9d Oncogene, 7(5);1019-25 (1992), Chiang, P. W., et al., xe2x80x9cUse Of A Fluorescent-PCR Reaction To Detect Genomic Sequence Copy Number And Transcriptional Abundance,xe2x80x9d Genome Research, 6(10):1013-26 (1996), Heid, C. A., et al., xe2x80x9cReal Time Quantitative PCR,xe2x80x9d Genome Research, 6(10):986-94 (1996), and Lee, H. H., et al., xe2x80x9cRapid Detection Of Trisomy 21 By Homologous Gene Quantitative PCR (HGQ-PCR),xe2x80x9d Human Genetics, 99(3):364-7 (1997)). Unfortunately, stromal contamination may reduce the ratio between tumor and normal chromosome copy number to less than 2-fold. By using a quantitative SNP-DNA array detection, each allele can be distinguished independently, thus reducing the effect of stromal contamination in half. Further by comparing the ratio of allele-specific LDR product formed from a tumor to control gene between a tumor and normal sample, it may be possible to distinguish gene amplification from loss of heterozygosity at multiple loci in a single reaction.
Using PCR/LDR to Detect SNPs.
The ligase detection reaction (xe2x80x9cLDRxe2x80x9d) is ideal for multiplexed discrimination of single-base mutations or polymorphisms (Barany, F., et al., xe2x80x9cCloning, Overexpression, And Nucleotide Sequence Of A Thermostable DNA Ligase Gene,xe2x80x9d Gene, 109:1-11 (1991), Barany, F., xe2x80x9cGenetic Disease Detection And DNA Amplification Using Cloned Thermostable Ligase,xe2x80x9d Proc. Natl. Acad. Sci. USA, 88:189-193 (1991), and Barany, F., xe2x80x9cThe Ligase Chain Reaction (LCR) In A PCR World,xe2x80x9d PCR Methods and Applications, 1:5-16 (1991)). Since there is no polymerization step, several probe sets can ligate along a gene without interference. The optimal multiplex detection scheme involves a primary PCR amplification, followed by either LDR (two probes, same strand) or ligase chain reaction (xe2x80x9cLCRxe2x80x9d) (four probes, both strands) detection. This approach has been successfully applied for simultaneous multiplex detection of 61 cystic fibrosis alleles (Grossman, P. D., et al., xe2x80x9cHigh-Density Multiplex Detection Of Nucleic Acid Sequences: Oligonucleotide Ligation Assay And Sequence-Coded Separation,xe2x80x9d Nucleic Acids Res., 22:4527-4534 (1994) and Eggerding, F. A., et al., xe2x80x9cFluorescence-Based Oligonucleotide Ligation Assay For Analysis Of Cystic Fibrosis Transmembrane Conductance Regulator Gene Mutations,xe2x80x9d Human Mutation, 5:153-165 (1995)), 6 hyperkalemic periodic paralysis alleles (Feero, W. T., et al., xe2x80x9cHyperkalemic Periodic Paralysis: Rapid Molecular Diagnosis And Relationship Of Genotype To Phenotype In 12 Families,xe2x80x9d Neurology, 43:668-673 (1993)), and 20 21-hydroxylase deficiency alleles (Day, D., et al., xe2x80x9cDetection Of Steroid 21 Hydroxylase Alleles Using Gene-Specific PCR And A Multiplexed Ligation Detection Reaction,xe2x80x9d Genomics, 29:152-162 (1995) and Day, D. J., et al., xe2x80x9cIdentification Of Non-Amplifying CYP21 Genes When Using PCR-Based Diagnosis Of 21-Hydroxylase Deficiency In Congenital Adrenal Hyperplasia (CA14) Affected Pedigrees,xe2x80x9d Hum Mol Genet, 5(12):2039-48 (1996)).
21-hydroxylase deficiency has the highest carrier rate of any genetic disease, with 6% of Ashkenazi Jews being carriers. Approximately 95% of mutations causing 21-hydroxylase deficiency are the result of recombinations between an inactive pseudogene termed CYP21P and the normally active gene termed CYP21, which share 98% sequence homology (White, P. C., et al., xe2x80x9cStructure Of Human Steroid 21-Hydroxylase Genes,xe2x80x9d Proc. Natl. Acad. Sci. USA, 83:5111-5115 (1986)). PCR/LDR was developed to rapidly determine heterozygosity or homozygosity for any of the 10 common apparent gene conversions in CYP21. By using allele-specific PCR, defined regions of CYP21 are amplified without amplifying the CYP21P sequence. The presence of wild-type or pseudogene mutation is subsequently determined by fluorescent LDR. Discriminating oligonucleotides complementary to both CYP21 and CYP21P are included in equimolar amounts in a single reaction tube so that a signal for either active gene, pseudogene, or both is always obtained. PCR/LDR genotyping (of 82 samples) was able to readily type compound heterozygotes with multiple gene conversions in a multiplexed reaction, and was in complete agreement with direct sequencing/ASO analysis. This method was able to distinguish insertion of a single T nucleotide into a (T)7 tract, which cannot be achieved by allele-specific PCR alone (Day, D., et al., xe2x80x9cDetection Of Steroid 21 Hydroxylase Alleles Using Gene-Specific PCR And A Multiplexed Ligation Detection Reaction,xe2x80x9d Genomics, 29:152-162 (1995)). A combination of PCR/LDR and microsatellite analysis revealed some unusual cases of PCR allele dropout (Day, D. J., et al., xe2x80x9cIdentification Of Non-Amplifying CYP21 Genes When Using PCR-Based Diagnosis Of 21-Hydroxylase Deficiency In Congenital Adrenal Hyperplasia (CAH) Affected Pedigrees,xe2x80x9d Hum Mol Genet, 5(12):2039-48 (1996)). The LDR approach is a single-tube reaction which enables multiple samples to be analyzed on a single polyacrylamide gel.
A PCR/LDR assay has been developed to detect germline mutations, found at high frequency (3% total), in BRCA1 and BRCA2 genes in the Jewish population. The mutations are: BRCA1, exon 2 185delAG; BRCA1, exon 20 5382insC; BRCA2, exon 11 6174delT. These mutations are more difficult to detect than most germline mutations, as they involve slippage in short repeat regions. A preliminary screening of 20 samples using multiplex PCR of three exons and LDR of six alleles in a single tube assay has successfully detected the three Ashkenazi BRCA1 and BRCA2 mutations.
Multiplexed PCR for Amplifying Many Regions of Chromosomal DNA Simultaneously.
A coupled multiplex PCR/PCR/LDR assay was developed to identify armed forces personnel. Several hundred SNPs in known genes with heterozygosities  greater than 0.4 are currently listed. Twelve of these were amplified in a single PCR reaction as follows: Long PCR primers were designed to have gene-specific 3xe2x80x2 ends and 5xe2x80x2 ends complementary to one of two sets of PCR primers. The upstream primers were synthesized with either FAM- or TET-fluorescent labels. These 24 gene-specific primers were pooled and used at low concentration in a 15 cycle PCR. After this, the two sets of primers were added at higher concentrations and the PCR was continued for an additional 25 cycles. The products were separated on an automated ABD 373A DNA Sequencer. The use of these primers produces similar amounts of multiplexed products without the need to carefully adjust gene-specific primer concentrations or PCR conditions (Belgrader, P., et al., xe2x80x9cA Multiplex PCR-Ligase Detection Reaction Assay For Human Identity Testing,xe2x80x9d Genome Science and Technology, 1:77-87 (1996)). In a separate experiment, non-fluorescent PCR products were diluted into an LDR reaction containing 24 fluorescently labeled allele-specific LDR probes and 12 adjacent common LDR probes, with products separated on an automated DNA sequencer. LDR probe sets were designed in two ways: (i) allele-specific FAM- or TET-labeled LDR probes of uniform length, or (ii) allele-specific HEX-labeled LDR probes differing in length by two bases. A comparison of LDR profiles of several individuals demonstrated the ability of PCR/LDR to distinguish both homozygous and heterozygous genotypes at each locus (Id.). The use of PCR/PCR in human identification to simultaneously amplify 26 loci has been validated (Lin, Z., et al., xe2x80x9cMultiplex Genotype Determination At A Large Number Of Gene Loci,xe2x80x9d Proc Natl Acad Sci USA, 93(6):2582-7 (1996)), or ligase based detection to distinguish 32 alleles although the latter was in individual reactions (Nickerson, D. A., et al., xe2x80x9cIdentification Of Clusters Of Biallelic Polymorphic Sequence-Tagged Sites (pSTSs) That Generate Highly Informative And Automatable Markers For Genetic Linkage Mapping,xe2x80x9d Genomics. 12(2):377-87 (1992)). This study validates the ability to multiplex both PCR and LDR reactions in a single tube, which is a prerequisite for developing a high throughput method to simultaneously detect SNPs throughout the genome.
For the PCR/PCR/LDR approach, two long PCR primers are required for each SNP analyzed. A method which reduces the need for multiple PCR primers would give significant savings in time and cost of a large-scale SNP analysis. The present invention is directed to achieving this objective.
The present invention is directed to a method of assembling genomic maps of an organism""s DNA or portions thereof. A library of an organism""s DNA is provided where the individual genomic segments or sequences are found on more than one clone in the library. Representations of the genome are created, and nucleic acid sequence information is generated from the representations. The sequence information is analyzed to determine clone overlap from a representation. The clone overlap and sequence information from different representations is combined to assemble a genomic map of the organism.
As explained in more detail infra, the representation can be created by selecting a subpopulation of genomic segments out of a larger set of the genomic segments in that clone. In particular, this is achieved by first subjecting an individual clone to a first restriction endonuclease under conditions effective to cleave DNA from the individual clone so that a degenerate overhang is created in the clone. Non-palindromic complementary linker adapters are added to the overhangs in the presence of ligase and the first restriction endonuclease to select or amplify particular fragments from the first restriction endonuclease digested clone as a representation. As a result, sufficient linker-genomic fragment products are formed to allow determination of a DNA sequence adjacent to the overhang. Although a number of first restriction endonucleases are suitable for use in this process, it is particularly desirable to use the enzyme DrdI to create the representation which comprises what are known as DrdI islands (i.e. the genomic segments which are produced when DrdI cleaves the genomic DNA in the clones).
The procedure is amenable to automation and requires just a single extra reaction (simultaneous cleavage/ligation) compared to straight dideoxy sequencing. Use of from 4 to 8 additional linker adapters/primers is compatible with microtiter plate format for delivery of reagents. A step which destroys the primers after the PCR amplification allows for direct sequencing without purifying the PCR products.
A method is provided for analyzing sequencing data allowing for assignment of overlap between two or more clones. The method deconvolutes singlet, doublet, and triplet sequencing runs allowing for interpretation of the data. For sequencing runs which are difficult to interpret, sequencing primers containing an additional one or two bases on the 3xe2x80x2 end will generate a readable sequence. As an alternative to deconvoluting doublet and triplet sequencing runs, other enzymes may be used to create short representational fragments. Such fragments may be differentially enriched via ultrafiltration to provide dominant signal, or, alternatively, their differing length provides unique sequence signatures on a full length sequencing run.
About 200,000 to 300,000 Drd Islands are predicted in the human genome. The DrdI Islands are a representation of {fraction (1/15)}th to {fraction (1/10)}th of the genome. With an average BAC size of 100-150 kb, a total of 20,000 to 30,000 BAC clones would cover the human genome, or 150,000 clones would provide 5-fold coverage. Using the DrdI island approach, 4-6 sequencing runs are required for a total of 600,000 to 900,000 sequencing reactions. New automated capillary sequencing machines (Perkin Elmer 3700 machine) can run 2,304 short (80-100 bp) sequencing reads per day. Thus, the DrdI approach for overlapping all BAC clones providing a 5-fold coverage of the human genome would require only 39 days using 10 of the new DNA sequencing machines.
The above approach will provide a highly organized contig of the entire genome for just under a million sequencing reactions, or about {fraction (1/70)}th of the effort required by just random clone overlap. Subsequently, random sequencing will fill in the sequence information between DrdI islands. Since the islands are anchored in the contig, this will result in a 2- to 4-fold reduction in the amount of sequencing necessary to obtain a complete sequence of the genome.
Single nucleotide polymorphisms or SNPs have been proposed as valuable tools for gene mapping and discovering genes associated with common diseases. The present invention provides a rapid method to find mapped single nucleotide polymorphisms within genomes. A representation of the genomes of multiple individuals is cloned into a common vector. Sequence information generated from representational library is analyzed to determine single nucleotide polymorphisms.
The present invention provides a method for large scale detection of single nucleotide polymorphisms (xe2x80x9cSNPxe2x80x9ds) on a DNA array. This method involves creating a representation of a genome from a clinical sample. A plurality of oligonucleotide probe sets are provided with each set characterized by (a) a first oligonucleotide probe, having a target-specific portion and an addressable array-specific portion, and (b) a second oligonucleotide probe, having a target-specific portion and a detectable reporter label. The oligonucleotide probes in a particular set are suitable for ligation together when hybridized adjacent to one another on a corresponding target nucleotide sequence, but have a mismatch which interferes with such ligation when hybridized to any other nucleotide sequence present in the representation of the sample. A mixture is formed by blending the sample, the plurality of oligonucleotide probe sets, and a ligase. The mixture is subjected to one or more ligase detection reaction (xe2x80x9cLDRxe2x80x9d) cycles comprising a denaturation treatment, where any hybridized oligonucleotides are separated from the target nucleotide sequences, and a hybridization treatment, where the oligonucleotide probe sets hybridize at adjacent positions in a base-specific manner to their respective target nucleotide sequences, if present in the sample, and ligate to one another to form a ligation product sequence containing (a) the addressable array-specific portion, (b) the target-specific portions connected together, and (c) the detectable reporter label. The oligonucleotide probe sets may hybridize to nucleotide sequences in the sample other than their respective target but do not ligate together due to a presence of one or more mismatches and individually separate during the denaturation treatment. A solid support with different capture oligonucleotides immobilized at particular sites is provided where the capture oligonucleotides have nucleotide sequences complementary to the addressable array-specific portions. After subjecting the mixture to one or more ligase detection reaction cycles, the mixture is contacted with the solid support under conditions effective to hybridize the addressable array-specific portions to the capture oligonucleotides in a base-specific manner. As a result, the addressable array-specific portions are captured on the solid support at the site with the complementary capture oligonucleotide. Finally the reporter labels of ligation product sequences captured to the solid support at particular sites are detected which indicates the presence of single nucleotide polymorphisms.
It has been estimated that 30,000 to 300,000 SNPs will be needed to map the positions of genes which influence the major multivariate diseases in defined populations using association methods. Since the above SNP database is connected to a closed map of the entire genome, new genes may be rapidly discovered. Further, the representative PCR/LDR/universal array may be used to quantify allele imbalance. This allows for use of SNPs to discover new tumor suppressor genes, which undergo loss of heterozygosity, or oncogenes, which undergo amplification, in various cancers.