Nucleic acid sequencing is routinely performed by the method of chain termination and gel separation, essentially as described by Sanger, F., S. Nicklen, and A. Coulson (Proc Natl Acad Sci U S A, 1977. 74(12); p. 5463–7). The method relies on the generation of a mixed population of nucleic acid fragments representing terminations at each base in the sequence. The sequence is then determined by electrophoretic separation of these fragments.
Recent efforts to increase the throughput of sequencing have resulted in the development of alternative methods that eliminate the electrophoretic separation step. A number of these methods utilise base extension (i.e. base addition) and have been described for example in WO 93/21340, U.S. Pat. No. 5,302,509 and U.S. Pat. No. 5,547,839. In these methods, the templates or primers are immobilised on a solid surface before exposure to reagents for sequencing. The immobilised molecules are incubated in the presence of nucleotide analogues that have a modification at the 3′ carbon of the sugar residue that reversibly blocks the hydroxyl group at that position. The incorporation of such modified nucleotides by a polymerase ensures that only one nucleotide is added during each cycle of base extension. The added base is then detected by virtue of a label that has been incorporated into the 3′ blocking group. Following detection, the blocking group is removed (or ‘cleaved’), typically, by photochemical means to expose a free hydroxyl group that is available for base addition during the next cycle.
Another approach to parallel sequencing has been the use of sequential elimination of nucleotides by type IIS restriction digestion (see, for example, U.S. Pat. Nos. 5,856,093, 5,599,675 and U.S. Pat. No. 5,715,330). With this method the template is rendered suitable for cohesive-end ligation. An adapter, that is substantially double stranded and contains a type IIS restriction enzyme recognition motif, is ligated to the template. The termini of these adapters that participate in ligation have one of the four bases at their end and their identity is demonstrated by a corresponding fluor on the adapter. The ligation step is dependent upon terminal base complementarity and is therefore the discriminating step. Following ligation the fluorescence is detected and the terminal base identified. The position of the type IIS recognition motif is such that cleavage by the restriction enzyme is effected one base downstream from the ligation site, exposing the next base for ligation and subsequent identification.
Generally, non-separation based approaches rely on the presence of large numbers of template molecules for each target sequence to generate a consensus sequence from a given target. Thus, for example, base extension reactions may be applied to multiple templates by interrogating discrete spots of nucleic acid, each comprising a multiplicity of molecules, immobilised in a spatially addressable array.
However, reactions of terminator incorporation/cleavage, or base excision are prone to errors. For example, as described above, base extension strategies have generally utilised nucleotide analogues that combine the functions of a reporter molecule, usually a fluor, with that of a terminator occupying the 3′ position on the sugar moiety. The bulky nature of the group and its position renders these compounds highly inefficient substrates for polymerases. In addition, the cleavage of the terminator group to permit subsequent additions is also subject to inefficiencies. In the presence of thousands, or preferably millions, of molecules for each target, even modest errors of less than 5% result in a cumulative loss of synchrony, between the multiplicity of strands representing each molecule, within a small number of cycles. Thus, with each cycle of sequencing the background noise increases progressively with a consequential deterioration of signal with each addition. This means that the number of bases of sequence data that can be obtained is limited before the specific signal becomes indistinguishable from background.
Recent advances in methods of single molecule detection (described, for example, in Trabesinger, W., et al., Anal Chem., 1999. 71(1); p. 279–83 and WO 00/06770) make it possible to apply sequencing strategies to single molecules. However, sequencing, when applied to clonal populations of molecules, is a stochastic process that results in some molecules undergoing reactions while others remain unmodified. Thus, in conventional sequencing methods, errors such as mis-incorporations are not normally of serious significance as the large numbers of molecules present ensure that consensus signal is obtained. When these reactions are applied to single molecules the outcomes are effectively quantized.
One such single molecule sequencing method is based on base excision and described, for example, in Hawkins, G. and L. Hoffman, Nature Biotechnology, 1997. vol.15; p. 803–804 and U.S. Pat. No. 5,674,743. With this strategy, single template molecules are generated such that every base is labelled with an appropriate reporter. The template molecules are digested with exonuclease and the excised bases are monitored and identified. As these methods use highly processive enzymes such as Lambda exonuclease, there is the potential for analysing large templates of several kilobases in length. However, the continuous monitoring of excised bases from each template molecule in real time limits the number of molecules that can be analysed in parallel. In addition, there are difficulties in generating a template where every base is labelled with an appropriate reporter such that excised bases can be detected on the basis of intrinsic optical or chemical properties.
Methods based on base extension (such as BASS) have also been adapted to a single molecule approach.
However, these techniques are prone to errors. In particular, incorporation of modified nucleotides can fail, for example, as the result of decreased efficiency of polymerase action with modified nucleotides. Where the reporter molecule is a fluorescent molecule, errors can also occur through failure of fluorescence because the fluor is lost, damaged, bleached, or unexcited. Importantly, a failure of elimination of a reporter molecule before the next cycle of sequencing begins may result in carryover of a reporter from a preceding cycle leading to a false base call. This can occur through failure to remove a terminator and/or reporter molecule (e.g. in a cleavage reaction). At the single molecule level, failures such as these will result in a failure in obtaining adequate sequence.