A primary goal of the human genome project is to determine the entire DNA sequence for the genomes of human, model, and other useful organisms. A related goal is to construct ordered clone maps of DNA sequences at 100 kilobase (kb) resolution for these organisms (D. R. Cox, E. D. Green, E. S. Lander, D. Cohen, and R. M. Myers, "Assessing mapping progress in the Human Genome Project," Science, vol. 265, No. 5181, pp. 2031-2, 1994), incorporated by reference. Integrated maps that localize clones together with polymorphic genetic markers (J. Weber and P. May, "Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction," Am. J. Hum. Genet:., vol. 44, pp. 388-396, 1989), incorporated by reference, are particularly useful for positionally cloning human disease genes (F. Collins, "Positional cloning: lets not call it reverse anymore," Nature Genet., vol. 1, No. 1, pp. 3-6, 1992), incorporated by reference. The greatest need, however, is for sequence-ready maps. Also useful are maps of expressed sequences. Human DNA sequences now exist as genomic libraries in a variety of small- and large-insert capacity cloning vectors, with yeast artificial chromosomes (YACs) (D. T. Burke, G. F. Carle, and M. V. Olson, "Cloning of large exogenous DNA into yeast by means of artificial chromosomes," Science, vol. 236, pp. 806-812, 1987), incorporated by reference, used extensively in mapping large regions. Efficient strategies for performing the requisite experimentation are critical for sequencing and mapping chromosomes or entire genomes.
The starting point for an effective sequencing method is a complete ordered clone map of a genome. Current strategies for ordering clones build contiguous sequences (contigs) using short-range comparison data. Sequence-tagged site (STS) (M. Olson, L. Hood, C. Cantor, and D. Botstein, "A common language for physical mapping of the human genome," Science, vol. 245, pp. 1434-35, 1989), incorporated by reference, comparisons with clones are used in STS-content mapping (SCM) (E. D. Green and P. Green, "Sequence-tagged site (STS) content mapping of human chromosomes: theoretical considerations and early experiences," PCR Methods and Applications, vol. 1, pp. 77-90, 1991), incorporated by reference. For chromosomal or genome-wide SCM, very large YACs (megaYACs) are required for the currently available STS densities (R. Arratia, E. S. Lander, S. Tavare, and M. S. Waterman, "Genomic mapping by anchoring random clones: a mathematical analysis," Genomics, vol. 11, pp. 806-827, 1991; W. J. Ewens, C. J. Bell, P. J. Donnelly, P. Dunn, E. Matallana, and J. R. Ecker, "Genome mapping with anchored clones: theoretical aspects," Genomics, vol. 11, pp. 799-805, 1991), incorporated by reference; these large YACs are often chimeric or contain gaps. Restriction fragment fingerprint mapping has been done with hybridization (C. Bellanne-Chantelot, B. Lacroix, P. Ougen, A. Billault, S. Beaufils, S. Bertrand, S. Georges, F. Glibert, I. Gros, G. Lucotte, L. Susini, J.-J. Codani, P. Gesnouin, S. Pook, G. Vaysseix, J. Lu-Kuo, T. Ried, D. Ward, I. Chumakov, D. Le Paslier, E. Barillot, and D. Cohen, "Mapping the whole genome by fingerprinting yeast artificial chromosomes," Cell, vol. 70, pp. 1059-1068, 1992; R. L. Stallings, D. C. Torney, C. E. Hildebrand, J. L. Longmire, L. L. Deaven, J. H. Jett, N. A. Doggett, and R. K. Moyzis, "Physical mapping of human chromosomes by repetitive sequence hybridization," Proc. Natl. Acad. Sci. USA, vol. 87, pp. 6218-6222, 1990), incorporated by reference, or without hybridization (A. Coulson, J. Sulston, S. Brenner, and J. Karn, "Toward a physical map of the genome of the nematode Caenorhabditis elegans," Proc. Natl. Acad. Sci. USA, vol. 83, pp. 7821-7825, 1986), incorporated by reference. With hybridization fingerprinting, path analysis of YAC fingerprints is not always reliable when constructing contigs. Hybridizing an internal clone sequence (e.g., end-clone sequence, Alu-PCR probes) against a library to determine neighboring sequences builds unpositioned YAC contigs (M. T. Ross and V. P. J. Stanton, "Screening large-insert libraries by hybridization," in Current Protocols in Human Genetics, vol. 1, N. J. Dracopoli, J. L. Haines, B. R. Korf, C. C. Morton, C. E. Seidman, J. G. Seidman, D. T. Moir, and D. Smith, ed. New York: John Wiley and Sons, 1995, pp. 5.6.1-5.6.34), incorporated by reference, although walking techniques are generally reserved for closing gaps.
The number of experiments needed for these short-range clone mapping approaches increases with the number of clones in the library. While considerable efficiency is gained by using multiplexed experiments with pooled reagents (G. A. Evans and K. A. Lewis, "Physical mapping of complex genomes by cosmid multiplex analysis," Proc. Natl. Acad. Sci. USA, vol. 86, No. 13, pp. 5030-4, 1989; E. D. Green and M. V. Olson, "Systematic screening of yeast artificial-chromosome libraries by use of the polymerase chain reaction," Proc. Natl. Acad. Sci. USA, vol. 87, No. 3, pp. 1213-7, 1990), incorporated by reference, the experimental requirements are at least proportional to the number of clones. A useful goal is to significantly reduce cost and increase throughput by achieving a number of required experiments largely independent of library size. One step toward this independence has been achieved by gridding an entire library onto nylon filters, and then hybridizing these filters with a set of probes (H. Lehrach, A. Drmanac, J. Hoheisel, Z. Larin, G. Lennon, A. P. onaco, D. Nizetic, G. Zehetner, and A. Poustka, "Hybridization fingerprinting in genome mapping and sequencing," in Genetic and Physical Mapping I: Genome Analysis, K. E. Davies and S. M. Tilghman, ed. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1990, pp. 39-81; A. P. Monaco, V. M. S. Lam, G. Zehetner, G. G. Lennon, C. Douglas, D. Nizetic, P. N. Goodfellow, and H. Lehrach, "Mapping irradiation hybrids to cosmid and yeast artificial chromosome libraries by direct hybridization of Alu-PCR products," Nucleic Acids Res., vol. 19, No. 12, pp. 3315-3318, 1991), incorporated by reference. For example, contigs of small genomic regions have been constructed by oligonucleotide fingerprinting of gridded cosmid filters (A. G. Craig, D. Nizetic, J. D. Hoheisel, G. Zehetner, and H. Lehrach, "Ordering of cosmid clones covering the herpes simplex virus type I," Nucleic Acids Res., vol. 18, No. 9, pp. 2653-60, 1990; A. J. Cuticchia, J. Arnold, and W. E. Timberlake, "ODS: ordering DNA sequences, a physical mapping algorithm based on simulated annealing," CABIOS, vol. 9, No. 2, pp. 215-219, 1992), incorporated by reference.
To efficiently span larger genomic regions, radiation hybrid (RH) mapping (D. R. Cox, M. Burmeister, E. R. Price, S. Kim, and R. M. Myers, "Radiation hybrid mapping: a somatic cell genetic method for constructing high-resolution maps of mammalian chromosomes," Science, vol. 250, pp. 245-250, 1990), incorporated by reference, has been used to localize small DNA sequences (though not clones) into high-resolution bins. Relatively few PCR experiments with one 96-well plate library of RHs generally suffice for mapping STSs or genes to unique bins having 250 kb to 1 Mb average resolution. The very large multiple fragments in each RH clone efficiently cover much of a chromosome (or genome). Assaying a sequence for intersection against a set of RHs provides long-range relational information for localization much akin to somatic cell hybrid (SCH) mapping (M. C. Weiss and H. Green, "Human-mouse hybrid cell lines containing partial complements of human chromosomes and functioning human genes," Proc. Natl. Acad. Sci. USA, vol. 58, pp. 1104-1111, 1976), incorporated by reference. However, RH mapping offers much greater resolution than SCH or fluorescent in situ hybridization (FISH) mapping.
For highly optimized experimentation, it would be desirable to combine high-resolution long-range RH mapping with low-cost high-throughput filter hybridization techniques to map clones. One can serially probe a gridded clone library with a set of RHs (H. Lehrach, A. Drmanac, J. Hoheisel, Z. Larin, G. Lennon, A. P. Monaco, D. Nizetic, G. Zehetner, and A. Poustka, "Hybridization fingerprinting in genome mapping and sequencing," in Genetic and Physical Mapping I: Genome Analysis, K. E. Davies and S. M. Tilghman, ed. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1990, pp. 39-81), in principle requiring a number of experiments that is independent of the clone library size and logarithmically related to the desired map resolution. However, complex hybridization probes such as RHs (or their Alu-PCR products) generate data containing considerable noise. This inherent uncertainty, together with the large clone insert size (which complicates conventional RH analysis), has thus far precluded high-resolution mapping of clones using RHs (J. Kumlien, T. Labella, G. Zehetner, R. Vatcheva, D. Nizetic, and H. Lehrach, "Efficient identification and regional positioning of YAC and cosmid clones to human chromosome 21 by radiation fusion hybrids," Mammalian Genome, vol. 5, No. 6, pp. 365-71, 1994), incorporated by reference.
Inner product mapping (IPM) is a hybridization-based method for achieving high-throughput, high-resolution RH mapping of clones (M. W. Perlin and A. Chakravarti, "Efficient construction of high-resolution physical maps from yeast artificial chromosomes using radiation hybrids: inner product mapping," Genomics, vol. 18, pp. 283-289, 1993), incorporated by reference, that overcomes this barrier. Experimental data have established that IPM is a highly rapid, inexpensive, accurate, and precise large-scale long-range mapping method, particularly when preexisting RH maps are available, and that IPM can replace or complement more conventional short-range mapping methods.
Improved mapping results can be obtained incrementally by gradually enlarging the data tables, a process which provides useful feedback to both experimentation and analysis. With additional RHs, the signal-to-noise characteristics of the clone profiles improve. This incremental process, and the relatively few RHs required for accurate mapping, follows the logarithmic number of the probes needed for IPM. For best mapping results, as many STS-typed RHs as feasible are used: with currently available high-throughput, robotically-assisted hybridization methods, the localization benefits of performing many filter hybridizations outweigh the relatively low experimentation costs. The incremental construction also highlights IPM's indirect inference of map location: STS-content mapping directly compares clones with STSs, and can not map small-insert clones against STSs which are insufficiently dense.
IPM builds accurate maps from low-confidence data. IPM's partitioning of the experiments into two data tables of (A) clones vs. RHs and (B) RHs vs. STSs also partitions the data noise. Table B is formed from relatively noiseless PCR-based comparisons of STSs against RH DNA, and can thus accurately order and position the STS bins using combinatorial mapping procedures (M. Boehnke, "Radiation hybrid mapping by minimization of the number of obligate chromosome breaks," Genetic Analysis Workshop 7: Issues in Gene Mapping and the Detection of Major Genes. Cytogenet Cell Genet, vol. 59, pp. 96-98, 1992; M. Boehnke, K. Lange, and D. R. Cox, "Statistical methods for multipoint radiation hybrid mapping," Am. J. Hum. Genet., vol. 49, pp. 1174-1188, 1991), incorporated by reference. Table A is formed from inherently unreliable and inconsistently replicated hybridizations of complex RH probes against gridded filters. Inner product mapping uses the table B data matrix to ameliorate these data errors and robustly translate a clones's noisy RH signature vector (a row of table A) into a chromosomal profile, whose peak bins the clone.
IPM is a proven approach for mapping YACs (C. W. Richard III, D. J. Duggan, K. Davis, J. E. Farr, M. J. Higgins, S. Qin, L. Zhang, T. B. Shows, M. R. James, and M. W. Perlin, "Rapid construction of physical maps using inner product mapping: YAC coverage of chromosome 11," in Fourth Internat'l Conference on Human Chromosome 11, September 22-24, Oxford, England, 1994), incorporated by reference, and is a candidate method for mapping PACs (P. A. Ioannou, C. T. Amemiya, J. Garnes, P. M. Kroisel, H. Shizuya, C. Chen, M. A. Batzer, and P. J. de Jong, "UA new bacterophage P1-derived vector for the propagation of large human DNA fragments," Nature Genet., vol. 6, No. 1, pp. 84-89, 1994), incorporated by reference, cosmids, expressed sequences (M. D. Adams, J. M. Kelley, J. D. Gocayne, M. Dubnick, M. H. Polymeropoulos, H. Xiao, C. R. Merril, A. Wu, B. Olde, R. F. Moreno, A. R. Kerlavage, W. R. McCombie, and J. C. Venter, "Complementary DNA sequencing: Expressed sequence tags and human genome project," Science, vol. 252, pp. 1651-1656, 1991), incorporated by reference, and other physical reagents (J. D. McPherson, C. Wagner-McPherson, M. Perlin, and J. J. Wasmuth, "A physical map of human chromosome 5 (Abstract)," Amer. J. Hum. Genet., vol. 55, No. 3 Supplement, pp. A265, 1994), incorporated by reference. Hybridization efficiency for table A can be improved by using long and IRE-bubble PCR (D. J. Munroe, M. Haas, E. Bric, T. Whitton, H. Aburatani, K. Hunter, D. Ward, and D. E. Housman, "IRE-bubble PCR: a rapid method for efficient and representative amplification of human genomic DNA sequences from complex sources," Genomics, vol. 19, No. 3, pp. 506-14, 1994), incorporated by reference, to reduce false negative errors, providing controls and redundant DNA spotting for internal calibration, and directly acquiring signals (e.g., via a phosphorimager, Molecular Dynamics, Sunnyvale, Calif.) to facilitate automated scoring. Current robotic technologies enable the high-throughput construction of gridded filters (A. Copeland and G. Lennon, "Rapid arrayed filter production using the `ORCA` robot," Nature, vol. 369, No. 6479, pp. 421-422, 1994), incorporated by reference; single use of these filters would reduce the time and error related to stripping and reprobing. Robots similarly provide high-throughput PCR comparisons for constructing table B. Alternatively, existing RH mapping data can be rapidly extended (at low cost) into inner product maps of libraries (U. Francke, E. Chang, K. Comeau, E.-M. Geigl, J. Giacalone, X. Li, J. Luna, A. Moon, S. Welch, and P. Wilgenbus, "A radiation hybrid map of human chromosome 18," Cytogenet. Cell Genet., vol. 66, pp. 196-213, 1994), incorporated by reference.
Whole human genome RH (WG-RH) libraries of 0.5 and 1.0 Mb resolution have been constructed (D. R. Cox, K. O'Connor, S. Hebert, M. Harris, R. Lee, B. Stewart, G. DiSibio, M. Boehnke, K. Lange, R. Goold, and R. M. Myers, "Construction and analysis of a panel of "whole genome" radiation hybrids (Abstract)," Amer. J. Hum. Genet., vol. 55, No. 3 Supplement, pp. A23, 1994; M. A. Walter, D. J. Spillett, P. Thomas, J. Weissenbach, and P. N. Goodfellow, "A method for constructing radiation hybrid maps of whole genomes," Nature Genet., vol. 7, No. 1, pp. 22-28, 1994), incorporated by reference, and have been characterized for the STSs used in the genome-wide CEPH megaYAC STS-content map (T. Hudson, S. Foote, S. Gerety, J. Ma, S.-h. Xu, X. Hu, J. Bae, J. Silva, J. Valle, S. Maitra, A. Colbert, L. Horton, M. Anderson, M. P. Reeve, M. Daly, A. Kaufman, C. Rosenberg, L. Stein, N. Goodman, J. Orlin, D. C. Page, and E. S. Lander, "Towards an STS-content map of the human genome (Abstract)," Amer. J. Hum. Genet., vol. 55, No. 3 Supplement, pp. A23, 1994), incorporated by reference. The availability of this WG-RH table B resource suggests that constructing table A by performing hybridizations between species specific (e.g., Alu-PCR) products of these RHs and gridded clones or expressed sequences, and then combining tables A and B to build a genome-wide inner product map, is a fast, accurate, and inexpensive approach to whole genome physical mapping. IPM has localized the components of chimeric YACs as distinct multiple peaks. IPM is therefore useful in verifying and extending current megaYAC mapping projects, and in multiplexed experimental designs that pool sequences from well-separated bins.
IPM provides long-range mapping information for DNA sequences relative to RH bins through DNA hybridization. This binning information can be complemented with short-range mapping data, such as oligonucleotide fingerprint hybridizations (H. Lehrach, A. Drmanac, J. Hoheisel, Z. Larin, G. Lennon, A. P. Monaco, D. Nizetic, G. Zehetner, and A. Poustka, "Hybridization fingerprinting in genome mapping and sequencing," in Genetic and Physical Mapping I: Genome Analysis, K. E. Davies and S. M. Tilghman, ed. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory, 1990, pp. 39-81), incorporated by reference, and (R. Drmanac, Z. Strezoska, I. Labat, S. Drmanac, and R. Crkvenjakov, "Reliable hybridization of oligonucleotides as short as six nucleotides," DNA Cell Biol., vol. 9, No. 7, pp. 527-534, 1990), incorporated by reference. Combining the data from these two high-throughput hybridization studies enables a two-pass BIN-SORT (A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Data Structures and Algorithms. Reading, Mass.: Addison-Wesley, 1983), incorporated by reference, strategy to high-resolution mapping: first use IPM to bin the clones, and then use short-range data to determine the orders and distances of clone subsets in proximate bins. This strategy can rapidly construct minimum-length paths of sequence-ready clones that tile the genome. Crucially, such IPM-derived contigs overcome the short-range limitations of all other known mapping methods, and enable the coordinated sequencing of the human genome, which is a well-recognized goal (F. Collins and D. Galas, "A new five-year plan for the U.S. Human Genome Project," Science, vol. 262, pp. 43-46, 1993), incorporated by reference. Such combination approaches can be highly effective for other purposes, such as using short-range proximity data to sharpen long-range inner product map results. IPM's experimental efficiencies enable effective determination of genome-wide DNA sequences, and the construction of high-resolution integrated genome maps for human, model organism, and agricultural species.
This invention pertains to determining the sequence of the genome of an organism or species through the use of a novel, unobvious, and highly effective clone mapping strategy. Such sequence information can be used for finding genes of known utility, determining structure/function properties of genes and their products, elucidating metabolic networks, understanding the growth and development of humans and other organisms, and making comparisons of genetic information between species. From these studies, diagnostic tests and pharmacological agents can be developed of great utility for preventing and treating human and other disease.