Many human clinical DNA samples, or sample libraries such as cDNA libraries derived from RNA, or extracted DNA samples taken from tissue, fluids, or other host material samples contain highly abundant sequences that have little informative value and increase the cost of sequencing. While methods have been developed to deplete these sequences (e.g., via hybridization capture), these methods are often time-consuming and can be inefficient. Moreover hybridization capture often looks to capture the DNA sequences of interest while discarding the remaining sequences. As a result depletion by hybridization capture is not a viable option when the DNA sequences of interest are not known in advance, e.g. when screening a sample to study all microbial or non-host DNA sequences.
While shotgun sequencing of human samples to study microbial DNA can be done, low levels of microbial DNA in many samples has precluded the shotgun sequencing of many complex and/or interesting samples, due to cost. This is true of, for example, a metagenomic analysis of a sample, where the sample contains more than one species of organism (eukaryotic, prokaryotic, or viral organisms). For example, DNA libraries derived from whole human blood often contain >99% human DNA. Therefore, to detect an infectious agent circulating in human blood from shotgun sequencing, one would need to sequence to very high coverage in order to ensure sufficient coverage. Thus much of the cost associated with sequencing high human DNA samples provides relatively little metagenomic data. As a result many human tissue DNA samples are considered unsuitable for metagenomic sequencing merely because the data yield is low compared to the resources required. Thus there is a need in the art to increase microbial DNA yield in high host DNA samples and specifically to increase the percent of microbial DNA being sequenced when sequencing high host endogenous (HHE) DNA samples.
Recent developments in DNA extraction have provided some sequencing techniques to the point that the field of metagenomics has transitioned from focusing on PCR-amplified 16S ribosomal RNA markers to shotgun sequencing of the whole metagenome. However, shotgun sequencing can yield less than desirable results when sequencing HHE DNA samples due to the low percentage of microbial DNA in the overall sample material. Moreover, shotgun sequencing often fails to provide enough information to make an accurate resolution in metagenomic analysis especially when the selected molecules (e.g., 16S ribosomal RNA) represent only a single lineage. Furthermore, 16S ribosomal RNA lineages cannot often differentiate pathogenic from non-pathogenic strains of closely related bacteria, a key goal of clinical metagenomic analysis.
Instead the use of whole genome DNA and RNA sequences is preferred for metagenetic analysis because it provides information from the entire metagenome. Thus there is a need in the art to provide a DNA and RNA sequencing technique for metagenomic analysis in order to derive improved resolution. For example, whole genome analysis of metagenomes from the fecal material of obese and normal weight patients has revealed highly reproducible differences in microbial community structure. These materials tend to have very high microbial DNA content (>99% microbe and <1% human).
In contrast, sequencing libraries derived from many other tissues including human blood, vagina, nasal mucosal membrane, and lung typically contain >90% human and <10% microbial DNA. While samples with <10% microbial DNA can still, with sufficient sequencing, yield enough information for metagenomic analyses, the required amount of sequencing of specimens with less target DNA is costly and thus untenable for many researchers.
Thus there exists a need in the art to achieve a low-cost, efficient method and compositions for metagenomic analyses. Such methods and compositions are provided herein.
All patents, patent applications, publications, documents, web links, and articles cited herein are incorporated herein by reference in their entireties.