Advances in the study of biological molecules have been led, in part, by improvement in technologies used to characterise the molecules or their biological reactions. In particular, the study of the nucleic acids DNA and RNA has benefited from developing technologies used for sequence analysis.
U.S. Pat. No. 5,302,509 describes a method for sequencing a polynucleotide template which involves performing multiple extension reactions using a DNA polymerase or DNA ligase to successively incorporate labelled polynucleotides complementary to a template strand. In such a “sequencing by synthesis” reaction a new polynucleotide strand based-paired to the template strand is built up in the 5′ to 3′ direction by successive incorporation of individual nucleotides complementary to the template strand. The substrate nucleoside triphosphates used in the sequencing reaction are labelled at the 3′ position with different 3′ labels, permitting determination of the identity of the incorporated nucleotide as successive nucleotides are added.
In order to carry out accurate sequencing a reversible chain-terminating structural modification or “blocking group” may be added to the substrate nucleosides to ensure that nucleotides are incorporated one at a time in a controlled manner. As each single nucleotide is incorporated, the blocking group prevents any further nucleotide incorporation into the polynucleotide chain. Once the identity of the last-incorporated labelled nucleotide has been determined the label moiety and blocking group are removed, allowing the next blocked, labelled nucleotide to be incorporated in a subsequent round of sequencing.
In certain circumstances the amount of sequence data that can be reliably obtained with the use of sequencing-by-synthesis techniques, particularly when using blocked, labelled nucleotides, may be limited. In some circumstances it is preferred to limit the sequencing “run” to a number of bases that permits sequence realignment with the human genome, typically around 25-30 cycles of incorporation. Whilst sequencing runs of this length are extremely useful, particularly in applications such as, for example, SNP analysis and genotyping, it would be advantageous in many circumstances to be able to reliably obtain further sequence data for the same template molecule.
The technique of “paired-end” or “pairwise” sequencing is generally known in the art of molecular biology, particularly in the context of whole-genomic shotgun sequencing (Siegel A. F. et al., Genomics. 2000, 68: 237-246; Roach J. C. et al., Genomics. 1995, 26: 345-353). Paired-end sequencing allows the determination of two “reads” of sequence from two places on a single polynucleotide template. The advantage of the paired-end approach is that there is significantly more information to be gained from sequencing two stretches each of “n” bases from a single template than from sequencing “n” bases from each of two independent templates in a random fashion. With the use of appropriate software tools for the assembly of sequence information (Millikin S. C. et al., Genome Res. 2003, 13: 81-90; Kent, W. J. et al., Genome Res. 2001, 11: 1541-8) it is possible to make use of the knowledge that the “paired-end” sequences are not completely random, but are known to occur on a single template, and are therefore linked or paired in the genome. This information has been shown to greatly aid the assembly of whole genome sequences into a consensus sequence.
Paired-end sequencing has typically been performed by making use of specialized circular shotgun cloning vectors known in the art. After cutting the vector at a specific single site, the template DNA to be sequenced (typically genomic DNA) is inserted into the vector and the ends resealed to form a new construct. The vector sequences flanking the insert DNA include binding sites for sequencing primers which permit sequencing of the insert DNA on opposite strands.
A disadvantage of this approach is that it requires time-consuming cloning of the DNA templates it is desired to sequence into an appropriate sequencing vector. Furthermore, because of the need to clone the DNA template into a vector in order to position binding sites for sequencing primers at both ends of the template fragment it is extremely difficult to make use of array-based sequencing techniques. With array-based techniques it is generally only possible to sequence from one end of a nucleotide template, this often being the end proximal to the point of attachment to the array.
WO 2004/070005 describes a method for double-ended sequencing of a polynucleotide template which can be carried out on a solid support. The method relies on simultaneous hybridisation of two or more primers to a target polynucleotide in a single primer hybridization step. Following the hybridization step, all of the primers hybridized to the template are blocked except for one, which has a free 3′ hydroxyl group which serves as an initiation point for a first sequencing reaction. Sequencing proceeds until no further chain elongation is possible, or else the sequencing reaction is terminated. Then one of the blocked primers is unblocked to give a free 3′ hydroxyl and a second sequencing reaction is performed from this initiation point. Thus, the template remains intact and attached to the solid support throughout.
A major drawback of this approach based in hybridisation of blocked and unblocked primers is that if it is desired to sequence two regions on complementary strands of a double-stranded nucleic acid template then it is necessary to hybridise primers to both complementary strands of the template in a single hybridisation step. Since both strands of the template remain intact and attached to the solid support, hybridisation of the primers to cognate sequences in the template strands will generally be unfavourable, against formation of a duplex by annealing of the two complementary strands of the template.
WO 98/44151 and WO 00/18957 both describe methods of nucleic acid amplification which allow amplification products to be immobilised on a solid support in order to form arrays comprised of clusters or “colonies” formed from a plurality of identical immobilised polynucleotide strands and a plurality of identical immobilised complementary strands. The nucleic acid molecules present in DNA colonies on the clustered arrays prepared according to these methods can provide templates for sequencing reactions, for example as described in WO 98/44152, but to date only a single sequencing read can be obtained from one type of immobilised strand in each colony.
The present inventors have now developed a method for paired-end sequencing of double-stranded polynucleotide templates, including double-stranded templates present on clustered arrays, such as those described in WO 98/44151 and WO 00/18957. The method permits sequencing of two distinct regions on complementary strands of a target polynucleotide duplex and is based on controlled formation of single-stranded templates which permit hybridisation of a sequencing primer. Using the method of the invention it is possible to obtain two linked or paired reads of sequence information from each double-stranded template on a clustered array, rather than just a single sequencing read as can be obtained with prior art methods.