Sequencing is routinely performed by the method of chain termination and gel separation, essentially as described by Sanger, F., S. Nicklen, and A. Coulson (Proc Natl Acad Sci USA, 1977. 74(12); p. 5463–7). The method relies on the generation of a mixed population of DNA fragments representing terminations at each base in the sequence. The sequence is then determined by electrophoretic separation of these fragments.
Recent efforts to increase the throughput of sequencing have resulted in the development of alternative methods that eliminate the electrophoretic separation step. A number of these methods utilise base extension (i.e. base addition) and have been described for example in WO 93/21340, U.S. Pat. Nos. 5,302,509 and 5,547,839. In these methods, the templates or primers are immobilised on a solid surface before exposure to reagents for sequencing. The immobilised molecules are incubated in the presence of nucleotide analogues that have a modification at the 3′ carbon of the sugar residue that reversibly blocks the hydroxyl group at that position. The incorporation of such modified nucleotides by a polymerase ensures that only one nucleotide is added during each cycle of base extension. The added base is then detected by virtue of a label that has been incorporated into the 3′ blocking group. Following detection, the blocking group is removed (or ‘cleaved’), typically, by photochemical means to expose a free hydroxyl group that is available for base addition during the next cycle.
Generally, non-separation-based approaches rely on the presence of large numbers of template molecules for each target sequence to generate a consensus sequence from a given target. Thus, for example, base extension reactions may be applied to multiple templates by interrogating discrete spots of nucleic acid, each comprising a multiplicity of molecules, immobilised in a spatially addressable array.
However, reactions of terminator incorporation/cleavage, or base excision are prone to errors. For example, as described above, base extension strategies have generally utilised nucleotide analogues that combine the functions of a reporter molecule, usually a fluor, with that of a terminator occupying the 3′ position on the sugar moiety. The bulky nature of the group and its position renders these compounds highly inefficient substrates for polymerases. In addition, the cleavage of the terminator group to permit subsequent additions is also subject to inefficiencies. In the presence of thousands, or preferably millions, of molecules for each target, even modest errors of less than 5% result in a cumulative loss of synchrony, between the multiplicity of strands representing each molecule, within a small number of cycles. Thus, with each cycle of sequencing the background noise increases progressively with a consequential deterioration of signal with each addition. This means that the number of bases of sequence data that can be obtained is limited before the specific signal becomes indistinguishable from background.
Recent advances in methods of single molecule detection (described, for example, in Trabesinger, W., et al., Anal Chem., 1999. 71(1); p. 279–83 and WO 00/06770) make it possible to apply sequencing strategies to single molecules. However, sequencing, when applied to clonal populations of molecules, is a stochastic process that results in some molecules undergoing reactions while others remain unmodified. Thus, in conventional sequencing methods, errors such as mis-incorporations are not normally of serious significance as the large numbers of molecules present ensure that consensus signal is obtained. When these reactions are applied to single molecules the outcomes are effectively quantized.
One such single molecule sequencing method is based on base excision and described, for example, in Hawkins, G. and L. Hoffman, Nature Biotechnology, 1997. vol.15; p.803–804 and U.S. Pat. No. 5,674,743. With this strategy, single template molecules are generated such that every base is labelled with an appropriate reporter. The template molecules are digested with exonuclease and the excised bases are monitored and identified. As these methods use highly processive enzymes such as Lambda exonuclease, there is the potential for analysing large templates of several kilobases in length. However, the continuous monitoring of excised bases from each template molecule in real time limits the number of molecules that can be analysed in parallel. In addition, there are difficulties in generating a template where every base is labelled with an appropriate reporter such that excised bases can be detected on the basis of intrinsic optical or chemical properties.
Methods based on base extension (such as BASS) have also been adapted to a single molecule approach.
However, these techniques are prone to errors. In particular, incorporation of modified nucleotides can fail, for example, as the result of decreased efficiency of polymerase action with modified nucleotides. Where the reporter molecule is a fluorescent molecule, errors can also occur through failure of fluorescence because the fluor is lost, damaged, bleached, or unexcited. At the single molecule level, failures such as these will result in a failure in obtaining adequate sequence.
It is an object of the present invention to provide a sequencing method that enables errors to be detected. It is a further object of the present invention to allow analysis and error prevention, or correction, by monitoring the fate of individual molecules through sequencing reactions.