Although two original methods that allowed DNA sequencing were published —the chemical approach of Maxim and Gilbert (A. M. Maxim and W. Gilbert, PNAS, 1977, 74, 560-564) and the enzyme based methods of Sanger (F. Sangeret al., PNAS, 1976, 5463-5467)—the dominant method currently used is based on Sanger's approach and so-called termination or dideoxy sequencing. This fundamental methodology is based on partial termination of a growing DNA chain to produce a ladder of labelled, terminated fragments, which require size separation for sequence analysis to be carried out. Obviously various improvements have been made since its inception, thus the labels have been converted from traditional radioactive nucleotides to fluorescent dyes, and capillaries (although often polymer filled) rather than flat gel electrophoresis are commonly used in high throughput applications. As a result there has been a massive drive to parallelism, such as the simultaneous running of 96 capillaries, the use of four dyes and analysis in a single channel, while enhancements in fluorescence sensitivity and more efficient polymerases have allowed longer sequencing runs. However, the fact that this method has inherent limitations can be seen by the colossal effort that has been required to sequence the human genome. A number of newer approaches have been reported over the past few years and really fall into three categories: (i). Sequencing by repetitive single base addition, (ii) Pyrosequencing and (iii) Restriction enzyme mediated cleavage or kinase ligation with deconvolution/decoding.
(i). Sequencing by repetitive single base addition. There are a number of reports in this area (Z. M. Li et al., PNAS, 2003, 100, 414-419; T. S. Seo et. al., PNAS, 2005, 102, 5926-5931; L. R. Bi et al., J. Am. Chem. Soc., 2006, 128, 2542). This approach replies on the enzyme mediated addition of a single base to a growing, primed DNA strand. Single base addition is controlled by some modification of the triphosphate such that multiple additions are impossible. This can be a physical block on the nucleotide or a chemical block (e.g. an ester on the 3′OH group). This approach relies on fluorogenically labelled building blocks and typically following removal of the blocking group the fluorogenic reporter will also be cleaved, allowing another cycle of reactions to be entertained. There are a number of issues with this approach, thus the need for enzymes and complex triphosphates. There are also issues with cleavage and termination chemistries which need to be essentially quantitative in order to allow reasonable read lengths.
(ii). Pyrosequencing (P. Nyren et. al., Anal. Biochem., 1996, 242, 84-89). In this approach a growing, primed DNA strand is treated with an enzyme and one of the four triphosphates. If the base is incorporated, pyrophosphate is liberated; if no incorporation then no pyrophosphate is generated. The pyrophosphate reacts with a sulfurylase which converts it in the presence of APS (adenosine-5′-phosphosulfate) to ATP. This is then treated with another enzyme (luciferase) to generate light. It is this light which is used to determine the addition or otherwise of a specific base to the growing DNA strand. If two or more bases of the same type are added at one time then more light is generated and this can be quantified. The process is then repeated with the next type of triphosphate allowing sequences to be generated. There are a number of issues, which include the fact that quantification of light emission is not always possible so longish stretches of single bases are essentially impossible to read (e.g. it is really impossible to distinguish between 14 or 15 bases of one type due to emission variations). This was the approach used in a recent paper describing sequencing from millions of beads arrayed in microwells (M. Margulies et al., Nature, 2005, 437, 376-380).
(iii). Restriction enzyme mediated cleavage or ligation with deconvolution/decoding (S. Brenner et al., Nature Biotech., 2000, 18, 630-634). In this approach, sequences are cleaved with a restriction enzyme to give an overhanging sequence. These are then decoded using a series of 16 encoded adapters. The adapters are then cleaved themselves, which exposes the next set of bases to be decoded. A similar approach is possible using ligation. There are again a number of problems: multiple steps per deconvolution; labelled probes and a variety of enzymes are still needed; incomplete cleavage or unwanted cleavage etc. . . . This was the approach used by Brenner (S. Brenner et al., Nature Biotech., 2000, 18, 630-634) as well as the approach used by Shendure and Church in their massive pareallel chip based sequencing (beads trapped in a polyacrylamide gel, J. Shendure et. al., Science, 2005, 308, 1728-1732 and R. D. Mitra et al., PNAS, 2003, 100, 5926-5931).
Single Nucleotide Polymorphisms: Another, but related area to sequencing is that of Single Nucleotide Polymorphisms (A-C. Syvanen et al., Nature Genetics, 2005, 37, S5-S10). Indeed SNP analysis can be viewed as sequencing a single base. Single nucleotide polymorphisms (SNPs) are on average found in every 300-1000 bases in humans and represent as much as 90% of all genetic variations between individuals. A SNP can constitute a genetic risk factor (or indeed advantage) to specific disease states as well as a host of physical features. SNP analysis methods are many and varied but generally consist of primer extension reactions using polymerases and fluorescently labelled triphosphates, although the methods of capture and analysis vary considerably. SNP analysis is a simple form of DNA sequencing in some respects, in that the identity of a single base is the major concern (although its context is of course crucial).
DNA Directed Ligations and Reactions. DNA and peptide nucleic acid (PNA) have been used in a number of ligation-based chemical approaches to synthesis (notably the work of D. R. Liu and O. Seitz-X. Li and D. R. Liu, Angew. Chem. Int. Ed., 2004, 43, 4848-4870; S. Ficht et al., ChemBioChem, 2005, 6, 2098-2103). Non-enzymatic ligation has also been achieved in a DNA-DNA sense by Kool and Richert (N. Griesang et al., Angew. Chem. Int. Ed., 2006, 45, 6144-6148 and ref therein (e.g. P. Hagenbuch et al., Angew. Chem. Int. Ed. 2005, 44, 6588-6592))—who used classical nucleophilic addition chemistry to ligate DNA strands (3′ -phosphothioate reacting with a 5′-iodothymidine) or monomers (e.g. 3′ aminonucleotide reacting with an activated phosphate). The first approach could be used for color detection of RNA and DNA point mutations, however it requires large primers on both the nucleophile and electrophile. Richerts approach, although monomer based, required so called helper primers such that two primers spanning the single base-gap were required to direct incorporation. Liu used dynamic chemistry to make polymers of PNA using a DNA template (D. R. Liu et al. J. Am. Chem. Soc. 2003, 125, 13924-13925).
PNA have been previously used as genetic probes (see review by P. Paulosova and F. Pellestor Ann. Genetique, 2004, 47, 349-358) due to their accurate recognition of complementary DNA or RNA sequences, however due to their lack of recognition by polymerases their use as tool for genetic analysis has been very limited.
Dynamic chemistry: Over the past decade there has been intense activity in the area of dynamic (combinatorial) libraries (P. T. Corbett et al. Chem. Rev., 2006, 106, 3652-3711; J. M. Lehn, Chem. Eur. J, 1999, 5, 2455-2463, O. Ramström and J. M. Lehn, Nat. Rev. Drug Discov. 2002 1, 26-36). A “dynamic library” can be prepared by mixing together in solution two complementary components, such as a selection of aldehydes and an amine, or diols and boronic acids, or thiols and disulfides in the presence of a template. Due to the dynamic equilibrium set up in the system (amine/aldehyde/imine) the most strongly bound ligand will predominate and thus in essence the template “builds” and “concentrates” its own partner. Recently, Dawson et al. (J. Am. Chem. Soc., 2006, 128, 15602-15603) showed that equilibrium kinetics of dynamic processes can also be accelerated by catalysts such as aniline.
As can be seen above, all the newer methods of DNA analysis (and the older methods) have a variety of issues and problems associated with their application, not least the use of enzymes and often expensive triphosphates.
The object of the present invention is to obviate or mitigate at least one of the aforementioned problems.