While the most abundant type of variant in the Human genome and the best-studied is the single-nucleotide polymorphism (SNP), it is increasingly clear that the so termed “fine-structural-variations” comprising alterations of copy number (insertions, deletions and duplications), inversions, translocations and other sequence rearrangements are integral features of the Human and other genomes. These types of variations appear present in much greater frequency in the general population than originally thought. Evidence is mounting to indicate that structural variants can comprise millions of nucleotides of heterogeneity within every genome. Understanding the role of fine-structural-variations in genome evolution, interaction with the environment, phenotypic diversity and in disease or disease susceptibility are among the most actively investigated areas of current genomic research. For review, refer to Bailey et al. (Science 297:1001 (2002)), Check (Nature 437:1094 (2005)), Cheng et al. (Nature 437:88 (2005)), and Feuk et al. (Nat Reviews 7:85 (2006), Redon et al. (Nature 444: 444 (2006)).
In comparison to analysis of SNPs, efficient high throughput methods for analysis of fine-structural-variations are not well developed. An important first step is the technique of array comparative genomic hybridization (array CGH) (Pinkel et al. Nat Genet 20:207 (1998); Pinkel et al. U.S. Pat. Nos. 5,830,645 and 6,159,685), which enables the qualification of relative copy numbers between target DNA and reference DNA. Array CGH allows reliable detection of deoxyribonucleic acid (DNA) copy-number differences between DNA or genomic samples with the resolution at the level of a single arrayed bacterial artificial chromosome (BAC) clone (Pinkel et al. Nat Genet 20: 207 (1998); Albertson et al. Nat Genet 25:144 (2000); Snijders et al. Nat Genet 29:263 (2001)). The adaptation of array CGH to cDNA (Heiskanen et al. Cancer Res 60:799 (2000); Pollack et al. Nat Genet 23:41 (1999)) and to high-density oligo-nucleotide array platforms (Brennan et al. Cancer Res 64:4744 (2004); Lucito et al. Genome Res 13:2291 (2003); Bignell et al. Genome Res 14:287 (2004); Hung et al. Hum Genomics 1:287 (2004)) further extends the resolution and utility for this approach. Through its use, array CGH has led to the identification of gene copy number alterations that are associated with tumor (Inazawa et al. Cancer Sci 7:559 (2004); Pinkel and Albertson Nat Genet 37 suppl:S11 (2005); Pollack et al. Proc Natl Acad Sci USA 99:12963 (2002); Albertson and Pinkel Hum Mol Genet 12 Spec No 2: R145 (2003)) and disease progression (Gonzalez et al. Science 307:5714 (2005)).
Despite the usefulness for copy number determination, array CGH is not suited to address the other types of genomic structural variations, most notably, inversions, translocations and other types of nucleic acid rearrangements. Tuzun et al. (Nat Genet 37:727 (2005)) attempt to address these limitations with an approach termed “fosmid paired-end mapping.” This approach relies on the head-full mechanism of fosmid packaging to produce genomic DNA libraries with reasonably uniform ˜40 kb size genomic inserts from test subjects. End-terminal sequencing of the randomly selected ˜40 kb library inserts produces pairs of short sequence tags in which each tag-pair marks two genomic positions with separation of approximately 40 kb along the lengths of the target DNA. The tag-pairs are then computationally aligned to a reference genomic assembly and any discordance with either their expected orientation or with their ˜40 kb separation distance, would denote the presence of at least one structural difference between target and reference nucleic acid spanning that region. Tag-pairs having map positions that are separated by more than 40 kb signify the presence of a deletion on the target DNA in respect to the reference; map positions with separation of less than 40 kb signify an insertion of DNA in the target. Inconsistencies in the orientation of the pair of mapped tags denote potential DNA inversions or other complex chromosomal rearrangements. Chromosomal translocations are signified by assignment of the tag-pair to two different chromosomes on the reference sequence. Analysis of over 1.1 million fosmid clone inserts enabled Tuzun et al. (Nat Genet 37:727 (2005)) to identify nearly 300 sites of structural variations between test subject and the reference genomic assembly.
While fosmid paired-end mapping is a useful start to identify fine-structural-variations in the Human Genome, the immense cost and logistical efforts required to purify and sequence more than a million fosmid insert ends for each test subject preclude its use in broad population and cohort surveys to identify genomic variations that could be associated with complex disease or in response to environmental factors and the like. Furthermore, fosmid vectors and their variants generally propagate in very low copy-numbers in host cells making reliable automated DNA production and sequencing difficult to maintain. Hence, there is a need for an efficient, robust high throughput and low cost method for the identification of fine-structural-variations for use in genomic and association studies to link these genetic elements to disease, disease progression and disease susceptibility. The present invention provides these and other substantial benefits.