The study of genetic material recovered directly from an environmental sample, by sequencing the genetic material, thereby bypassing isolation and cultivation step is referred to as metagenomics. Metagenomics provides information pertaining to taxonomic diversity, physiology and complex interactions among various microorganisms present in the environmental sample.
The genetic material obtained directly from the environmental sample is sequenced into a plurality of sequences, called metagenomic sequences. Each of these metagenomic sequences are then classified or cataloged into various taxonomic groups, such as kingdom, phylum, class, order, family, genus, or species. This whole process of classifying metagenomic sequences is called taxonomic classification or binning.
Taxonomic classification of metagenomic sequences, as the one mentioned above, helps in reconstructing the microbial composition of the environmental sample. It also provides information regarding evolutionary history and previously unrecognized physiological abilities of microbial communities specialized to live in a given environmental niche. Taxonomic classification not only catalogs known organisms, but also classifies new organisms to corresponding taxonomic groups for subsequent analyses. Precise taxonomic classification of metagenomic sequences is important since wrongly classified sequences may affect the accuracy of several downstream analyses, for example, sequence assembly, gene prediction, and functional annotation.
Researchers typically employ a variety of taxonomic classification techniques to classify metagenomic sequences. Conventional taxonomic classification techniques associate a sequence to a taxon if a feature of the sequence, such as sequence similarity or composition, is similar to reference sequences belonging to that taxon. However, such taxonomic classification techniques are either time consuming or prevent users from assessing the taxonomic diversity of environmental samples at appropriate taxonomic levels, i.e., such classification techniques are not specific and accurate.