Large-scale sequence analysis of genomic DNA is central to understanding a wide range of biological phenomena related to states of health and disease both in humans and in many economically important plants and animals, e.g., Collins et al (2003), Nature, 422: 835-847; Service, Science, 311: 1544-1546 (2006); Hirschhorn et al (2005), Nature Reviews Genetics, 6: 95-108; National Cancer Institute, Report of Working Group on Biomedical Technology, “Recommendation for a Human Cancer Genome Project,” (February, 2005); Tringe et al (2005), Nature Reviews Genetics, 6: 805-814. The need for low-cost high-throughput sequencing and re-sequencing has led to the development of several new approaches that employ parallel analysis of many target DNA fragments simultaneously, e.g., Use of water/buffer-in-oil emulsions to carry out enzymatic reactions is well known in the art, particularly carrying out PCRs, e.g., as disclosed by Drmanac et al., Scienta Yugoslavica, 16(1-2): 97-107 (1990), Margulies et al, Nature, 437: 376-380 (2005); Margulies et al, Nature, 437: 376-380 (2005); Shendure et al (2005), Science, 309: 1728-1732; Metzker (2005), Genome Research, 15: 1767-1776; Shendure et al (2004), Nature Reviews Genetics, 5: 335-344; Lapidus et al, U.S. patent publication US 2006/0024711; Drmanac et al, U.S. patent publication US 2005/0191656; Brenner et al, Nature Biotechnology, 18: 630-634 (2000); and the like.
Such approaches reflect a variety of solutions for increasing target polynucleotide density in planar arrays and for obtaining increasing amounts of sequence information from each application of a sequence detection reaction.
Most traditional methods of sequence analysis are restricted to determining a few tens of nucleotides before signals become significantly degraded, thus placing a significant limit on overall sequencing efficiency. Such short sequence reads are particularly problematic in regions of a target sequence which contain long strings of repeating nucleotides or tandem repeats.
In view of such limitations, it would be advantageous for the field if methods and tools could be designed to increase the efficiency of sequencing reactions as well as the efficiency of assembling complete sequences from shorter read lengths.