1. Field of the Invention
Embodiments of the present invention relate in general to methods and compositions for sequencing nucleic acid molecules.
2. Description of Related Art
Current sequencing strategies are unable to address many regions of large, complex genomes because of the size and sequence composition of such regions. The longest published DNA sequencing read lengths are about 1000 base pairs (bp) (ABI/Sanger capillary methods, and Roche FLX platforms). Such read lengths are not long enough to span the longest gaps in the human (and many other) genomes, which range up to 34 Mbp and include about 7% of the human genome. These gaps are likely to contain regions of medical significance (e.g., multiple sclerosis (Reich et al. (2005) Nat. Genet. 37(10):1113)).
One strategy that has been considered for addressing gaps in genomic sequences calls for subcloning the regions in question prior to sequencing. There are methods to subclone fragments that are on the order of 100 kb in length (bacterial artificial chromosomes (BACs) and Complete Genomics Long Fragment Read (CGI LFR)). However, fragments of this length 1) are long enough that they pose internal assembly problems, 2) are short enough that they don't span the aforementioned gaps, and 3) introduce amplification artifacts (in vivo or in vitro) more frequently than do smaller fragments. Alternative methods that have been considered utilize haplotyping by either in situ sequencing (Zhang et al. (2006) Nat. Biotech. 24(6):680) or dilution amplification (Zhang K (2006) Nat. Genet. 38(3):382), but such methods cannot fill long gaps.