Generally, ribosomal DNA sequences, such as 16S rDNA, are conserved across all bacterial and archaeal species and therefore, ribosomal DNA sequences are analyzed for estimating taxonomic diversity of a given environmental sample, such as a metagenome. Subsequent to analysis, enumeration of the number of ribosomal DNA sequences assigned to various taxonomic groups, such as species, genus, family, order, class or phylum, helps in quantifying the relative abundance of various organisms or taxa present in the environmental sample.
Since, the analysis of the ribosomal DNA sequences is expected to provide a comprehensive snapshot of taxonomic diversity, a majority of projects spend considerable resources (in terms of time, cost, and labor) in carrying out experiments that amplify, clone, and sequence ribosomal DNA sequences present in a given environmental sample. The ribosomal DNA sequences obtained from these experiments are then analyzed to get estimates of taxonomic diversity. In order to further characterize the given environmental sample, the entire genomic content of the environmental sample under study is subsequently extracted, fragmented, and sequenced. Millions of DNA sequences, originating from the genomes of various microbes in the environmental sample, are thus obtained. Given that the entire genomic content of an environmental sample is fragmented and sequenced, a subset of these sequenced DNA fragments corresponds to partial and complete portions of ribosomal gene sequences originating from various organisms in that sample. This subset of DNA fragments can thus be referred to as ribosomal DNA fragments.
With the recent advance in technology, and availability of faster and cheaper sequencing techniques, the taxonomic diversity of an environmental sample can alternatively be ascertained by identifying and subsequently analyzing these ribosomal DNA fragments. Obtaining estimates of taxonomic diversity using this alternative approach, therefore, does not depend on experimental procedures related to amplification, cloning, and sequencing of ribosomal DNA sequences. Instead, it depends on the following two factors. First is the cost of fragmenting and sequencing the entire genomic content of an environmental sample. Second is the efficiency of the ‘in silico’ method that is employed for identification of ribosomal DNA fragments from amongst the entire set of DNA fragments (obtained by fragmenting and sequencing the entire genomic content of an environmental sample). Given the current availability of efficient and cost effective sequencing technologies, the applicability of the ‘alternative approach’ thus depends to a large extent on the availability of in silico techniques, that can efficiently identify ribosomal DNA fragments from amongst the entire set of DNA fragments. Employing such in silico techniques is thus expected to save considerable amounts of time, efforts, and cost.
However, currently available in silico techniques for identification of ribosomal DNA sequences amongst millions of DNA sequences are not efficient in terms of computational time and sensitivity. Consequently, these in silico techniques have found little or no application in projects for direct identification of ribosomal DNA sequences from the sequenced genomic content of a given environmental sample.