As indicated above, incorporating two UMI sequence at either side of taxonomically relevant genomic sequence for sequencing can be advantageous because it increases the complexity of the UMI tagging, as well as allows for the detection of aberrant chimera formation that can occur during library formation and sequencing. However, as described in the Examples above, in certain implementations, a relatively low concentration of tagging polynucleotides (incorporating the UMI domains) is preferable. A major challenge to incorporating two flanking UMIs for taxonomic profiling (and chimera detection) is that both UMI-containing polynucleotides should to be present at limiting dilution. This can impose a severe population bottleneck on the number of molecules that receive two UMIs, because it is typically a low-probability event.
Method
To enhance the incorporation rate of two UMI sequences, i.e., one at either side of the target taxonomically relevant genomic sequence, a molecular inversion probe can be employed that includes two tagging polynucleotides, each with a UMI domain, within a single molecule. FIG. 10 provides a schematic illustration of an embodiment using such a molecular inversion probe that targets the 16S rRNA gene V4 region. At the top of the scheme, the linear inversion molecular probe is shown with the first tagging polynucleotide at the 3′-end and the second first tagging polynucleotide at the 5′-end. The first tagging polynucleotide contains the first linker sequence, the first UMI domain, and the first targeting arm that anneals to or near to the taxonomically relevant genomic sequence and serves as the serves as the primer for extension. The second first tagging polynucleotide sequence contains the second linker sequence, the second UMI domain, and the second targeting arm that anneals to or near to the taxonomically relevant genomic sequence. Both of the first targeting arm and second targeting arm anneal to the target genomic molecule, at locations on either side of the desired taxonomically relevant genomic sequence (here 16S V4). The gap between the first targeting arm and second targeting arm is filled by extension from the first targeting arm, which serves as the primer. Finally, the extended 3′-end reaches the location of the second targeting arm at the 5′-end, whereby the strands are then ligated to produce a circularized, single stranded template. This circularized, single stranded molecular inversion probe template can then be exposed to treatment by an exonuclease to eliminate all single stranded species in solution, thereby reducing background in subsequent amplification, indexing, and sequencing steps, which are performed as described above.
While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.