The ability to determine the sequence of a polynucleotide is of great scientific importance, as demonstrated by the Human Genome Project, which has now determined the entire sequence of the three billion bases of the human genome. However, this sequence information represents an average human and there is a considerable need to understand the differences between individuals at a genetic level.
The principal method in general use for large-scale DNA sequencing is the chain termination method. This method was first developed by Sanger and Coulson (Sanger et al. Proc. Natl. Acad. Sci. USA 1977; 74: 5463-5467), and relies on the use of dideoxy derivatives of the four nucleoside triphosphates which are incorporated into the nascent polynucleotide chain in a polymerase chain reaction. Upon incorporation, the dideoxy derivatives terminate the polymerase reaction and the products are then separated by gel electrophoresis and analysed to reveal the position at which the particular dideoxy derivative was incorporated into the chain.
Although this method is used widely and produces reliable results, it is recognised that it is slow, labour-intensive and expensive. Furthermore, it is not an effective method for detecting the differences between two sequences, which may often consist of a single base change (known as a Single Nucleotide Polymorphism, or SNP).
Nucleic acid arrays have recently become a preferred method of determining polynucleotide sequences and SNPs, usually in the context of hybridisation events (Mirzabekov, Trends in Biotechnology (1994) 12:27-32). A large number of array-based sequencing procedures utilise labelled nucleotides in order to obtain the identity of the added (hybridised) bases. These arrays rely on the stepwise identification of suitably labelled bases, referred to in U.S. Pat. No. 5,634,413 as “single base” sequencing methods. Such “single base” procedures utilise two types of label; the radiolabel and the fluorescent label. The radiolabelling of nucleotides has the advantages of high sensitivity and low background. However, radiolabelling suffers from poor resolution.
Fluorescently-labelled nucleotides are now used widely in many techniques. Such nucleotides can be incorporated into the nascent polynucleotide chain in a stepwise manner by the polymerase chain reaction. Each of the different nucleotides (A, T, G and C) incorporates a unique fluorophore at the 3′ position which can be detected using a sensitive fluorescent detector, e.g. a charge-coupled detector (CCD). The fluorophore often also acts as a “blocking group”, which removes the ability of the incorporated nucleotide to serve as a substrate for further nucleotide addition and therefore prevents uncontrolled polymerisation. Often, a “removable blocking group” is used, which can be removed by a specific treatment that results in cleavage of the covalent bond between a nucleotide and the blocking group, allowing the sequencing reaction to continue.
Removable blocking groups rely on a number of possible removing treatment strategies, for example, a photochemical, chemical or enzymatic treatment. However, these have been shown to be difficult to control and apply. Differences in local environments, for example within an array, can result in the removal of an entire nucleotide, or even several nucleotides, instead of just the intended label. Such occurrences have serious consequences for the fidelity of the sequencing method, as uncontrolled removal of nucleotides results in sequencing data becoming out of phase and sequence data becoming corrupted or unusable.
A further disadvantage of both labelling methods is that repeat sequences can lead to ambiguity of results. This problem is recognised in Automation Technologies for Genome Characterisation, Wiley-Interscience (1997), ed. T. J. Beugelsdijk, Chapter 10:205-225.
There is therefore a need for an improved method for identifying the sequence of a polynucleotide, in particular for detecting variations within a polynucleotide sequence, eg for detecting SNPs, which combines the high sensitivity and low background of radiolabelled nucleotides with the high resolution of fluorescently-labelled labels. Further, the method should be capable of being carried out by high-throughput, automated processes, reducing the cost associated with existing methods.