Extensive amounts of research and money are being invested to develop a method to sequence DNA, (Human Genome Project) by recording the signal of each base as the polymer is passed in a base-by-base manner through a recording system. Such a system could offer a rapid and low cost alternative to present methods based on chemical reactions with probing analytes and as a result might usher a revolution in medicine.
Research in this area to date has focused on the question of developing a measurement system that can record a sufficient signal from each monomer in order to distinguish one monomer from another. In the case of DNA, the monomers are the well-known bases: adenine (A), cytosine (C), guanine (G), and thymine (T). It is necessary that the signals produced by each base be: a) different from that of the other bases, and b) different by an amount that is substantially larger than the internal noise of the measurement system. This aspect of sequencing is fundamentally limited by the specific property of the polymer being probed in order to differentiate the monomers and the signal to noise ratio (SNR) of the measurement device used to probe it.
A separate question is the order in which the monomers are measured. In order to know which monomer (or group of monomers) is being measured, it is necessary to localize the polymer to a precision comparable in length to the monomer itself. Controlling the polymer position and motion at such short length scales is challenging, in particular the polymer is subject to diffusive (Brownian) motion due to the impact of other molecules in solution.
One popular method to limit the polymer motion is to pass it through a nanopore, an approximately cylindrical cavity in a solid substrate with diameter equal to or a little larger than the polymer of interest. For such a nanopore, the polymer motion is effectively in one dimension (1-D) along the axis of the pore, but is still subject to stochastic variations in this 1-D motion due to Brownian effects. Specifically, Brownian motion results in a “random walk” such that the mean square displacement in a given time t is given by 2Dt for a polymer of diffusion constant D. This random motion is added to the imposed translocation motion, resulting in an inherent uncertainty in the number of bases that have passed through the measurement device. For example, for DNA confined within an alpha-hemolysin (aHL) protein pore at 15° C., the mean net 1-D motion due to diffusion alone in 100 microseconds (μs) is approximately 5 bases. Thus, in a notional example in which a given base is measured for 100 μs, the DNA would on average have moved a linear distance away from its desired position a total of 5 bases in either direction due to diffusion, resulting in, in this example, a segment of the DNA being re-measured or skipped. Such positional errors can occur no matter how sensitive the measurement system is that identifies each base.
Recent discoveries have shown that physical changes such as cooling the electrolyte and changing the viscosity of the electrolyte can reduce the diffusion constant of DNA in αHL by a considerable factor. However, even with such measures, methods proposed to sequence DNA by recording the signal of each base in a serial manner are still expected to have sequence order errors exceeding the current benchmark target of 99.99% accuracy. Accordingly, what is needed in order to develop a practical polymer sequencing system from such new approaches is a method to process data in order to reduce the effect of stochastic variations in the polymer position.