Several publications and patent documents are referenced in this application in order to more fully describe the state of the art to which this invention pertains. The disclosure of each of these publications and documents is incorporated by reference herein.
The ability to acquire and analyse DNA sequence data has increased phenomenally over the past few years. As a result nucleic acid analysis has become increasingly important in many areas of biology, biotechnology and medicine. Molecular biology and pharmaceutical drug development now make intensive use of nucleic acid analysis. The most challenging areas are whole genome sequencing, single nucleotide polymorphism detection, screening and gene expression monitoring, which typically require generation and analysis of large amounts of nucleic acid sequence data.
One area of technology which revolutionised the study of nucleic acids was the development of nucleic acid amplification techniques, such as the polymerase chain reaction (PCR). Amplification reactions, such as PCR, can enable the user to specifically and selectively amplify a particular target nucleic acid of interest from a complex mixture of nucleic acids. However, there is also an ongoing need for nucleic acid amplification techniques which enable simultaneous amplification of complex mixtures of templates of diverse sequence, such as genomic DNA fragments (e.g. ‘whole genome’ amplification) or cDNA libraries, in a single amplification reaction.
PCR amplification cannot occur in the absence of annealing of forward and reverse amplification primers to primer binding sequences in the template to be amplified under the conditions of the annealing steps of the PCR reaction, i.e. if there is insufficient complementarity between primers and template. Some prior knowledge of the sequence of the template is therefore required before one can carry out a PCR reaction to amplify a specific template, unless random primers are used with a consequential loss of specificity. The user must usually know the sequence of at least the primer-binding sites in the template in advance so that appropriate primers can be designed, although the remaining sequence of the template may be unknown. The need for prior knowledge of the sequence of the template increases the complexity and cost of PCR amplification of complex mixtures of templates, such as genomic DNA fragments.
Several of the new methods employed for high throughput DNA sequencing (Nature. 437, 376-380 (2005); Science. 309, 5741, 1728-1732 (2005)) rely on a universal amplification reaction, whereby a DNA sample is randomly fragmented, then treated such that the ends of the different fragments all contain the same DNA sequence. Fragments with universal ends can be amplified in a single reaction with a single pair of amplification primers. Separation of the library of fragments to the single molecule level prior to amplification ensures that the amplified molecules form discrete populations that can then be further analysed. Such separations can be performed either in emulsions (Nature. 437, 376-380 (2005); Science. 309, 5741, 1728-1732 (2005)), or on a surface (Nucleic Acids Research 27, e34 (1999); Nucleic Acids Research 15, e87 (2000)).
WO 98/44151 and WO 00/18957 both describe methods of forming polynucleotide arrays based on ‘solid-phase’ nucleic acid amplification, which is a bridging amplification reaction wherein the amplification products are immobilised on a solid support in order to form arrays comprised of nucleic acid clusters or ‘colonies’. Each cluster or colony on such an array is formed from a plurality of identical immobilised polynucleotide strands and a plurality of identical immobilised complementary polynucleotide strands. The arrays so-formed are generally referred to herein as ‘clustered arrays’ and their general features will be further understood by reference to WO 98/44151 or WO 00/18957, the contents of both documents being incorporated herein in their entirety by reference.
In common with all amplification techniques, solid-phase bridging amplification requires the use of forward and reverse amplification primers which include ‘template-specific’ nucleotide sequences which are capable of annealing to sequences in the template to be amplified, or the complement thereof, under the conditions of the annealing steps of the amplification reaction. The sequences in the template to which the primers anneal under conditions of the amplification reaction may be referred to herein as ‘primer-binding’ sequences.
Certain embodiments of the methods described in WO 98/44151 and WO 00/18957 make use of ‘universal’ primers to amplify templates comprising a variable template portion that it is desired to amplify flanked 5′ and 3′ by common or ‘universal’ primer binding sequences. The ‘universal’ forward and reverse primers include sequences capable of annealing to the ‘universal’ primer binding sequences in the template construct. The variable template portion, or ‘target’ may itself be of known, unknown or partially known sequence. This approach has the advantage that it is not necessary to design a specific pair of primers for each target sequence to be amplified; the same primers can be used for amplification of different templates provided that each template is modified by addition of the same universal primer-binding sequences to its 5′ and 3′ ends. The variable target sequence can therefore be any DNA fragment of interest. An analogous approach can be used to amplify a mixture of templates (targets with known ends), such as a plurality or library of target nucleic acid molecules (e.g. genomic DNA fragments), using a single pair of universal forward and reverse primers, provided that each template molecule in the mixture is modified by the addition of the same universal primer-binding sequences.
Such ‘universal primer’ approaches to PCR amplification, and in particular solid-phase bridging amplification, are advantageous since they enable multiple template molecules of the same or different, known or unknown sequence to be amplified in a single amplification reaction, which may be carried out on a solid support bearing a single pair of ‘universal’ primers. Simultaneous amplification of a mixture of templates of different sequences would otherwise require a plurality of primer pairs, each pair being complementary to each unique template in the mixture. The generation of a plurality of primer pairs for each individual template is not a viable option for complex mixtures of templates.
The addition of universal priming sequences onto the ends of targets to be amplified by PCR can be achieved by a variety of methods known to those skilled in the art. For example, a universal primer consisting of a universal sequence at its 5′ end and a degenerate sequence at its 3′ end can be used in a PCR (DOP-PCR, e.g., PNAS 1996 vol 93 pg 14676-14679) to amplify fragments randomly from a complex target sequence or a complex mixture of target sequences. The degenerate 3′ portion of the primer anneals at random positions on DNA and can be extended to generate a copy of the target that has the universal sequence at its 5′ end.
Alternatively, adaptors that contain universal priming sequences can be ligated onto the ends of the target sequences. The adaptors may be single-stranded or double-stranded. If double-stranded, they may have overhanging ends that are complementary to overhanging ends on the target molecules that may have been generated by digestion with a restriction endonuclease, or added with a DNA polymerase or terminal transferase. Alternatively, the double-stranded adaptors may be blunt, in which case the targets are also blunt ended. The blunt ends of the targets may have been formed during a process to shear the DNA into fragments, or they may have been formed by an end repair reaction, as would be well known to those skilled in the art.
A single adaptor or two different adaptors may be used in a ligation reaction with target sequences. If a target has been manipulated such that its ends are the same, i.e. both are blunt or both have the same overhang, then ligation of a single compatible adaptor will generate a template with that adaptor on both ends. However, if two compatible adaptors, adaptor A and adaptor B, are used, then three permutations of ligated products are formed: template with adaptor A on both ends, template with adaptor B on both ends, and template with adaptor A on one end and adaptor B on the other end. This last product is, under some circumstances, the only desired product from the ligation reaction and consequently additional purification steps are necessary following the ligation reaction to purify it from the ligation products that have the same adaptor at both ends.
In the preparation of libraries for universal amplification, it is advantageous to make the insert region as short as possible. Such ‘short insert libraries’, where the primary nucleic acid sample is fragmented so that the average length of the target inserts is less than 150 base pairs, are advantageous for minimising the amount of sample DNA required in short read sequencing, and to minimise the size of amplified clusters. Short insert libraries minimise the amount of nucleic acid needed; for example sequencing 25 bases of a 100 base pair fragment, means 25% of the sample DNA is sequenced, whereas if the length of the fragments is 250 bases, only 10% of the material is sequenced, and more DNA is required to obtain the same amount of information about the sample. Longer fragments also give larger clusters when amplified on a surface, reducing the number of features than can be packed into an array of a finite size.
A major drawback in current methods for library construction is that the steps of the method are carried out under conditions that can denature some of the target fragments. This problem becomes acute when the fragments are short, and can result in the loss of fragments with a high level of A/T base pairs (which are weaker than G/C basepairs) from the library. The library is therefore not representative of the original sample.