Over the past two decades, the power of DNA sequencing has increased exponentially (Shendure et al. (2004) Nature Review Genetics 5:335, incorporated herein by reference in its entirety for all purposes), leading to the sequencing of the genomes of over 355 organisms (Kyrpides (1999) Bioinformatics 15:773, incorporated herein by reference in its entirety for all purposes). Nevertheless, much of the genetic diversity of the biosphere remains unsampled (Moreira and Lopez-Garcia (2002) Trends Microbiol. 10:31; Falkowski and de Vargas (2004) Science 304:58, each of which is incorporated herein by reference in its entirety for all purposes). Most of this diversity is due to microorganisms which cannot be easily cultured as large clonal pools required for conventional genome sequencing.
Metagenomic approaches that do not require such culturing, such as environmental shotgun sequencing and large insert library sequencing, while revealing the enormous biodiversity in environmental samples, suffer from two major drawbacks: (1) the difficulty of assembling contigs into discrete genomes, and (2) biased sampling toward abundant species (Tyson et al. (2004) Nature 428:37; Venter et al. (2004) Science 304:66; DeLong (2005) Nat. Rev. Microbiol. 3:459; Beja et al. (2002) Appl. Environ. Microbiol. 68:335; Tringe et al. (2005) Science 308:554; Riesenfeld et al. (2004) Ann. Rev. Genet. 38:525; Rodriguez-Valera (2004) FEMS Microbiol. Lett. 231:153, each of which is incorporated herein by reference in its entirety).
Isothermal multiple displacement amplification (MDA) is superior to PCR based methods in terms of high-yield, high-fidelity, and a lack of significant bias in terms of sequence coverage, but is known to yield a dominant “background” of undesired amplification when template drops below nanogram levels (Dean et al. (2002) Proc. Natl. Acad. Sci. U.S.A. 99:5261; Telenius et al. (1992) Genomics 13:718; Zhang et al. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:5847; Dietmaier et al. (1999) Am. J. Pathol. 154:83; Nelson et al. (2002) Biotechniques Suppl:44; Lage et al. (2003) Genome Res. 13:294, each of which is incorporated herein by reference in its entirety). Accordingly, mixed results have been reported on such amplifications from single human cells (Handyside et al. (2004) Mol. Hum. Reprod. 10:767; Hellani (2004) Mol. Hum. Reprod. 10:847; Sorensen (2004) Anal Biochem. 324:312, each of which is incorporated herein by reference in its entirety).
Microorganisms with smaller genomes pose an even greater challenge as the mass of a single genome is typically at the femtogram level, while the standard MDA protocol requires from about one to about ten nanograms of template DNA (Dean et al., supra, incorporated herein by reference in its entirety for all purposes). Genome sequencing of Xylella fastidiosa was only possible when amplified with MDA from approximately 1000 cells (Detter et al. (2002) Genomics 80:691, incorporated herein by reference in its entirety for all purposes). Although initial success has been reported on genome amplification from single E. coli cells, it has been estimated that only 30% of amplified DNA was specific to the target genome due to the presence of background amplification (Raghunathan et al. (2005) Appl. Environ. Microbiol. 71:3342, incorporated herein by reference in its entirety for all purposes). Reduction of reaction volume offers a way to reduce background amplification (Hutchison et al. (2005) Proc. Natl. Acad. Sci. U.S.A. 102:17332, incorporated herein by reference in its entirety for all purposes). However, to bridge these methods to single cell genome sequencing, a number of critical technical issues remain to be addressed, such as the quantification of background, amplification bias, amplification error, and the compatibility to current genome sequencing pipelines.
The ability to sequence an entire genome from a single uncultured cell opens a window to genomic information not evident with current metagenomic or population-based methods. Such an ability is highly desirable for charting the largely unmapped genomic biosphere, allowing genomic analyses not feasible with current methods such as: (1) the characterization of genetic heterogeneity in a population of cells; (2) the revelation of cis-relationship between sequences greater than 200 kb apart (unreachable by BAC/fosmid cloning); (3) the study of trans-interactions between host and parasitic genomes (phages and viral) or cell-cell interactions (e.g., predator-prey, symbionts, commensals); and/or (4) the identification of rare species for genome sequencing. Such an ability is highly desirable for charting the largely unmapped genomic biosphere.