Several publications and patent documents are referenced in this application in order to more fully describe the state of the art to which this invention pertains. The disclosure of each of these publications and documents is incorporated by reference herein.
Advances in the study of biological molecules have been led, in part, by improvement in technologies used to characterise the molecules or their biological reactions. In particular, the study of the nucleic acids DNA and RNA has benefited from developing technologies used for sequence analysis.
One method for sequencing a polynucleotide template involves performing multiple extension reactions using a DNA polymerase to successively incorporate labelled nucleotides to a template strand. In such a “sequencing by synthesis” reaction a new nucleotide strand base-paired to the template strand is built up in the 5′ to 3′ direction by successive incorporation of individual nucleotides complementary to the template strand. If used simultaneously, the substrate nucleoside triphosphates used in the sequencing reaction may be blocked to prevent over-incorporation and labelled differently, permitting determination of the identity of the incorporated nucleotide as successive nucleotides are added.
In order to carry out accurate sequencing a reversible chain-terminating structural modification or “blocking group” may be added to the substrate nucleotides to ensure that nucleotides are incorporated one at a time in a controlled manner. As each single nucleotide is incorporated, the blocking group prevents any further nucleotide incorporation into the polynucleotide chain. Once the identity of the last-incorporated labelled nucleotide has been determined the label moiety and blocking group are removed, allowing the next blocked, labelled nucleotide to be incorporated in a subsequent round of sequencing.
In certain circumstances the amount of sequence data that can be reliably obtained with the use of sequencing-by-synthesis techniques, particularly when using blocked, labelled nucleotides, may be limited. In some circumstances it is preferred to limit the sequencing “run” to a number of bases that permits sequence realignment with the human genome, typically around 25-30 cycles of incorporation. Whilst sequencing runs of this length are extremely useful, particularly in applications such as, for example, SNP analysis and genotyping, it would be advantageous in many circumstances to be able to reliably obtain further sequence data for the same template molecule.
The technique of “paired-end” or “pairwise” sequencing is generally known in the art of molecular biology, particularly in the context of whole-genomic shotgun sequencing. Paired-end sequencing allows the determination of two “reads” of sequence from two places on a single polynucleotide duplex. The advantage of the paired-end approach is that there is significantly more information to be gained from sequencing two stretches each of “n” bases from a single template than from sequencing “n” bases from each of two independent templates in a random fashion. With the use of appropriate software tools for the assembly of sequence information it is possible to make use of the knowledge that the “paired-end” sequences are not completely random, but are known to occur on a single duplex, and are therefore linked or paired in the genome. This information has been shown to greatly aid the assembly of whole genome sequences into a consensus sequence.
Paired-end sequencing has typically been performed by making use of specialized circular shotgun cloning vectors. After cutting the vector at a specific single site, the template DNA to be sequenced (typically genomic DNA) is inserted into the vector and the ends resealed to form a new construct. The vector sequences flanking the insert DNA include binding sites for sequencing primers which permit sequencing of the insert DNA on opposite strands. However, the need for sequencing primers at both ends of the template fragment makes the use of array-based sequencing techniques extremely difficult. With array-based techniques, which usually rely on a single stranded template, it is generally only possible to sequence from one end of a nucleotide template, as the complementary strand is not attached to the surface.
A number of methods for double-ended sequencing of a polynucleotide template which can be carried out on a solid support have been reported, for example US20060024681, US20060292611, WO06110855, WO06135342, WO03074734, WO07010252, WO07091077 and WO00179553.
WO 98/44151 and WO 00/18957 both describe methods of nucleic acid amplification which allow amplification products to be immobilised on a solid support in order to form arrays comprised of clusters or “colonies” formed from a plurality of identical immobilised polynucleotide strands and a plurality of identical immobilised complementary strands. The nucleic acid molecules present in DNA colonies on the clustered arrays prepared according to these methods can provide templates for sequencing reactions, for example as described in WO 98/44152. It is advantageous to enable the efficient sequencing of both strands of such clusters, as described in detail in the methods herein.