Biopolymer sequencing refers to the determination of the order of nucleotide bases—adenine, guanine, cytosine, and thymine—in a biomolecule, e.g., a DNA or RNA molecule, or portion thereof. Biomolecule sequencing has numerous applications, for example, in diagnostics, biotechnology, forensic biology, and drug development. Various techniques have been developed for biopolymer sequencing.
Sequencing by Hybridization (SBH) is a method for biomolecule sequencing in which a set of single stranded fragments or probes (generally, all possible 4k, oligonucleotides of length k) are attached to a substrate or hybridization array. The array is exposed to a solution of single-stranded fragments of DNA. Hybridization between the probes and the DNA reveals the spectrum of the DNA, i.e., the set of all k-mers that occur at least once in the sequence. Determining a sequence using SBH involves finding an Eulerian path (a path that traverses all edges) of a graph representing the spectrum of detected k-mers. Convergence on a single solution occurs when only one sequence for the k-mers is consistent with the spectrum. Ambiguous sequencing occurs when more than one sequence for the hybridized k-mers is consistent with the spectrum. Current SBH techniques are limited because any sufficiently dense graph with one solution has multiple, equally well-supported solutions.
Hybridization Assisted Nanopore Sequencing (HANS) is a method for sequencing genomic lengths of DNA and other biomolecules, involving the use of one or more nanopores, or alternatively, nanochannels, micropores, or microchannels. HANS involves hybridizing long fragments of the unknown target with short probes of known sequence. The method relies on detecting the position of hybridization of the probes on specific portions of the biomolecule (e.g., DNA) to be sequenced or characterized. The probes bind to the target DNA wherever they find their complementary sequence. The distance between these binding events is determined by translocating the target fragments through a nanopore (or nanochannel, micropore, or microchannel). By reading the current or voltage across the nanopore, it is possible to distinguish the unlabeled backbone of the target DNA from the points on the backbone that are binding sites for probes. Since DNA translocates at an approximately constant velocity, a time course of such current or voltage measurements provides a measurement of the relative distance between probe binding sites on the target DNA.
After performing these measurements for each kind of probe, one at a time, the DNA sequence is determined by analyzing the probe position data and matching up overlapping portions of probes. However, due to inaccuracies associated with measuring absolute probe positions using HANS, sequencing ambiguities may still arise.
There is a need for improved methods for sequencing biomolecules that are able to avoid or resolve the ambiguities encountered with current SBH, HANS, and other sequencing techniques.