Advances in the study of biological molecules have been led, in part, by improvement in technologies used to characterise the molecules or their biological reactions. In particular, the study of the nucleic acids DNA and RNA has benefited from developing technologies used for sequence analysis.
U.S. Pat. No. 5,302,509 describes a method for sequencing a polynucleotide template which involves performing multiple extension reactions using a DNA polymerase or DNA ligase to successively incorporate labelled polynucleotides complementary to a template strand. In such a “sequencing by synthesis” reaction a new polynucleotide strand based-paired to the template strand is built up in the 5′ to 3′ direction by successive incorporation of individual nucleotides complementary to the template strand. The substrate nucleoside triphosphates used in the sequencing reaction are labelled at the 3′ position with different 3′ labels, permitting determination of the identity of the incorporated nucleotide as successive nucleotides are added.
In order to carry out accurate sequencing a reversible chain-terminating structural modification or “blocking group” may be added to the substrate nucleosides to ensure that nucleotides are incorporated one at a time in a controlled manner. As each single nucleotide is incorporated, the blocking group prevents any further nucleotide incorporation into the polynucleotide chain. Once the identity of the last-incorporated labelled nucleotide has been determined the label moiety and blocking group are removed, allowing the next blocked, labelled nucleotide to be incorporated in a subsequent round of sequencing.
In certain circumstances the amount of sequence data that can be reliably obtained with the use of sequencing-by-synthesis techniques, particularly when using blocked, labelled nucleotides, may be limited, typically to around 25-30 cycles of incorporation. Whilst sequencing “runs” of this length are extremely useful, particularly in applications such as, for example, SNP analysis and genotyping, it would be advantageous in many circumstances to be able to reliably obtain further sequence data for the same template molecule.
The technique of “paired-end” or “pairwise” sequencing is generally known in the art of molecular biology, particularly in the context of whole-genomic shotgun sequencing (Siegel A. F. et al., Genomics. 2000, 68: 237-246; Roach J. C. et al., Genomics. 1995, 26: 345-353). Paired-end sequencing allows the determination of two “reads” of sequence from two places on a single polynucleotide template. The advantage of the paired-end approach is that there is significantly more information to be gained from sequencing two stretches each of “n” bases from a single template than from sequencing “n” bases from each of two independent templates in a random fashion. With the use of appropriate software tools for the assembly of sequence information (Mullikin et al., Genome Res. 2003, 13: 81-90; Kent, W. J. et al., Genome Res. 2001, 11: 1541-8) it is possible to make use of the knowledge that the “paired-end” sequences are not completely random, but are known to occur on a single template, and are therefore very close in the genome. This information has been shown to greatly aid the assembly of whole genome sequence into a consensus sequence.
Paired-end sequencing has typically been performed by making use of specialized circular shotgun cloning vectors known in the art. After cutting the vector at a specific single site, the template DNA to be sequenced (typically genomic DNA) is inserted into the vector and the ends resealed to form a new construct. The vector sequences flanking the insert DNA include binding sites for sequencing primers which permit sequencing of the insert DNA on opposite strands.
A disadvantage of this approach is that it requires time-consuming cloning of the DNA templates it is desired to sequence into an appropriate sequencing vector.
Furthermore, because of the need to clone the DNA template into a vector in order to position binding sites for sequencing primers at both ends of the template fragment it is extremely difficult to make use of array-based sequencing techniques. With array-based techniques it is generally only possible to sequence from one end of a nucleotide template, this often being the end proximal to the point of attachment to the array.
With the use of hairpin nucleic acid anchors or double stranded nucleic acid anchors (such as those described in the applicant's co-pending International application published as WO 01/57248), one end of a template immobilised on an array may be “covalently closed” giving a free 3′ end which permits sequencing of the 5′ overhanging template strand by successive incorporation of nucleotides. However, given that the distal portions (distal from the point of attachment to the array) of such immobilised templates are generally single-stranded and that the sequence of the template is usually unknown prior to immobilisation on the array, it is not straightforward to devise means for determining the sequence of the distal end of the immobilised template, beyond the first “run” of sequence that can be obtained from the free 3′ end provided by the anchor. It is not possible simply to design a sequencing primer complementary to a region of the template whose sequence is unknown.
WO 2004/070005 describes a method for double-ended sequencing of a polynucleotide template which can be carried out on a solid support. The method relies on simultaneous hybridisation of two or more primers to a target polynucleotide in a single primer hybridization step. Following the hybridization step, all of the primers hybridized to the template are blocked except for one, which has a free 3′ hydroxyl group which serves as an initiation point for a first sequencing reaction. Sequencing proceeds until no further chain elongation is possible, or else the sequencing reaction is terminated. Then one of the blocked primers is unblocked to give a free 3′ hydroxyl and a second sequencing reaction is performed from this initiation point. An advantage of this approach is that there is no need to perform any denaturation and re-hybridization steps between the first and second sequencing reactions, as the two primers providing the initiation points are annealed in a single hybridisation step. Thus, the template remains intact and attached to the solid support throughout.
A major drawback of this approach based in hybridisation of blocked and unblocked primers is that it is necessary to know the sequence of at least two regions of the polynucleotide template to be sequenced in order to design two or more suitable primers capable of binding to the target. If the method is to be used to sequence polynucleotides of unknown sequence then it is necessary to carry out sample preparation steps in order to add regions of known sequence to the polynucleotide to be sequenced in order to provide the necessary primer-binding sites. This can be achieved, for example, by amplification or by sub-cloning a template of unknown sequence into a vector in order to add known sequences onto the 5′ and 3′ ends of the template.
The present inventors have sought to develop techniques which generally permit the paired-end or pairwise sequencing approach to be used without any knowledge of the sequence at the distal end of the template and without the need for any intermediate cloning of the template into a vector. Such techniques would permit pairwise sequencing to be used in conjunction with a wide range of array-based sequencing technologies, including single molecule arrays as well as clustered arrays.