A typical healthy individual's body is inhabited with trillions of microbes across various body sites called microbiomes. Some examples of microbiome sites include skin, intestinal, stomach, gut, oral, conjunctival, and vaginal. To better understand the role of these microbiomes and how they affect physiology and disease state we can analyze what microbes comprise a microbiome and how they correlate or affect the health status and clinical response of an individual.
For example, the human gut microbiome is known to play a key role in many health conditions, including obesity, gastrointestinal health, nutrient absorption, and drug metabolism among others. Owing to such discoveries, the NIH has invested $150 million in the analysis of the Human Microbiome Project over the next 5-years for analyzing the microbial composition of various human body sites.
Despite this awareness of the interrelation between microbiomes and health, the complexity of the microbiome, as well as difficulties in categorizing and characterizing the constituents of the microbiome have made understanding these relationships challenging. Consequently, these challenges have presented hurdles in the development of diagnostic and therapeutic applications.
Metagenomic approaches to understanding the microbiome stand to help further illuminate the roles of the microbiomes and have only recently been enabled by “next-generation” sequencing technologies. While the information uncovered by these studies will become increasingly valuable to those interested in targeting the microbiome for therapeutic interventions and consumer products, transforming this large amount of data into meaningful data that can be used to develop diagnostics and therapeutics presents a significant hurdle. Two apparent bottlenecks in harnessing the power of the microbiome, is the cost of undertaking these analyses and the intrinsic complexity of metagenomic analysis mentioned above.
The current gold standard in the field for taxonomic classification of bacterial species is through the DNA sequencing of the 16S ribosomal RNA (rRNA) subunit. The 16S rRNA subunit was chosen as an “ideal” target for classification because it is universally present in all bacteria and it contains nine variable regions which can be used to distinguish taxonomies. However, focusing solely on the 16S rRNA subunit presents its own technical challenges owing to the fact that some bacteria share the same variable regions resulting in misclassification.
Furthermore, current “second-generation” sequencing technologies being used to sequence the 16S rRNA subunit have read lengths which often yield incomplete coverage of theses variable regions. For example, sequencing by 454 gives average read lengths of 500 bp and Illumina's MiSeq and HiSeq platforms give average read lengths of 100-450 bp. With these read lengths, bacterial classification often suffer from issues of accuracy, especially in a complex metagenomic sample such as a microbiome sample.
The present disclosure provides solutions to these limitations by providing methods, systems, compositions, and kits that yield more accurate information and hence more accurate classification of a microbiome. Such information allows for multiplex and efficiency advantages over the current technology and the development of consumer products such diagnostic tests, therapeutics and probiotic therapies.