Large-scale molecular analysis is central to understanding a wide range of biological phenomena related to states of health and disease both in humans and in a host of economically important plants and animals, e.g. Collins et al (2003), Nature, 422: 835-847; Hirschhorn et al (2005), Nature Reviews Genetics, 6: 95-108; National Cancer Institute, Report of Working Group on Biomedical Technology, “Recommendation for a Human Cancer Genome Project,” (February, 2005). Miniaturization has proved to be extremely important for increasing the scale and reducing the costs of such analyses, and an important route to miniaturization has been the use of microarrays of probes or analytes. Such arrays play a key role in most currently available, or emerging, large-scale genetic analysis and proteomic techniques, including those for single nucleotide polymorphism detection, copy number assessment, nucleic acid sequencing, and the like, e.g. Kennedy et al (2003), Nature Biotechnology, 21: 1233-1237; Gunderson et al (2005), Nature Genetics, 37: 549-554; Pinkel and Albertson (2005), Nature Genetics Supplement, 37: 511S17; Leamon et al (2003), Electrophoresis, 24: 3769-3777; Shendure et al (2005), Science, 309: 1728-1732; Cowie et al (2004), Human Mutation, 24: 261-271; and the like. However, the scale of microarrays currently used in such techniques still falls short of that required to meet the goals of truly low cost analyses that would make practical such operations as personal genome sequencing, environmental sequencing to use changes in complex microbial communities as an indicator of states of health, either personal or environmental, studies that associate genomic features with complex traits, such as susceptibilities to cancer, diabetes, cardiovascular disease, and the like, e.g. Collins et al (cited above); Hirschhorn et al (cited above); Tringe et al (2005), Nature Reviews Genetics, 6: 805-814; Service (2006), Science, 311: 1544-1546.
The nucleic acid hybridization process is used widely for characterization of a DNA/RNA sample. Antibodies or other proteins or compounds are used in various binding assays for characterization of protein samples. For an efficient extensive analysis of sample with many hybridization assays arrays of gene/genomic fragments or synthetic oligonucleotides are prepared in various ways. For preparing arrays of gene/genome fragments, individual fragments are usually prepared in separate tubes/wells and than deposited on the substrate. This process is too laborious for preparing large number of samples (e.g. close or more than one million) and/or does not allow preparation of an array of small, high density spots, especially below 10 micrometer dot size. For preparing high density arrays of about 100,000 or more oligonucleotides in situ chemical synthesis of DNA is usually performed.
Increasing the scale of analysis in array-based schemes for DNA sequencing is particularly challenging as the feature size of the array is decreased to molecular levels, since most schemes require not only a procedure for forming high density arrays, but also repeated cycles of complex biochemical steps that complicate the problems of array integrity, signal generation, signal detection, and the like, e.g. Metzker (2005), Genome Research, 15: 1767-1776; Shendure et al (2004), Nature Reviews Genetics, 5: 335-344; Weiss (1999), Science, 283: 1676-1683. Some approaches have employed high density arrays of unamplified target sequences, which present serious signal-to-noise challenges, when “sequencing by synthesis” chemistries have been used, e.g. Balasubramanian et al, U.S. Pat. No. 6,787,308. Other approaches have employed in situ amplification of randomly disposed target sequences, followed by application of “sequencing by synthesis” chemistries. Such approaches also have given rise to various difficulties, including (i) significant variability in the size of target sequence clusters, (ii) gradual loss of phase in extension steps carried out by polymerases, (iii) lack of sequencing cycle efficiency that inhibits read lengths, and the like, e.g. Kartalov et al, Nucleic Acids Research, 32: 2873-2879 (2004); Mitra et al, Anal. Biochem., 320: 55-65 (2003); Metzker (cited above).
In view of the above, it would be advantageous for the medical, life science, and agricultural fields if there were available molecular arrays and arraying techniques that permitted efficient and convenient analysis of large numbers of individual molecules, such as DNA fragments covering substantially an entire mammalian-sized genome, in parallel in a single analytical operation.
Whole genome DNA sequencing has revolutionized life sciences and drug development. However, sequencing complex genomes using capillary based sequencers is still very expensive, and it takes months for large sequencing centers to complete one genome. High density gene specific probe arrays (Patil et al., 200, Science 294:1719-1723) provide an efficient way to re-sequence complex genomes for single base variation discovery. Still, the cost is close to one million dollars per genome. A more efficient DNA sequencing technology is needed if the ultimate goal is to sequence multiple human genomes in one day for less than $1000 per genome. The ability to routinely sequence complex genomes at this low cost will revolutionize studies of gene function and gene networks, drug and diagnostic target discovery. Most importantly, it will provide a basis for comprehensive prognostics and diagnostics as critical components in developing and implementing preventive and predictive personalized medicine.
Several methods have been proposed to achieve this level of sequencing efficiency. Many new and developing technologies for whole genome sequencing rely upon the clonal isolation and amplification of genomic DNA fragments and preparation of DNA arrays. The most promising methods are based on the formation of random DNA arrays that are analyzed in many biochemical cycles using various chemistries such as base incorporation/sequencing by synthesis, step-wise degradation, probe hybridization and combinatorial ligation of short probes. The concept of sequencing localized, amplified targets within a random array or matrix structure has been explored successfully by other researchers. Mitra et al., 2003 (Fluorescent in situ sequencing on polymerase colonies. Anal Biochem 320:55-65) have demonstrated the use of polymerase colonies (polonies) to generate PCR based targets for sequencing with fluorescent nucleotide analogues.
Clonal based PCR amplification on beads within emulsions has been employed by 454 Life Sciences to create a random order of genomic fragment clones that are then sequenced. Sequencing-by-synthesis is performed using a form of pyrosequencing that relies upon the release and detection of pyrophosphate after the addition of each nucleotide (Margulies et al., 2005. Genome sequencing in microfabricated high-density picoliter reactors. Nature 437:376-380). Also, a recent publication (Shendure et al., 2005, Science 309:1728-1732) describes ligation based sequencing on random DNA arrays formed on 1 micron beads with clonally amplified DNA. The ligation chemistry used is similar to the ligation chemistry that has been used for many years for sequencing with arrays of probes.