Several publications and patent documents are referenced in this application in order to more fully describe the state of the art to which this invention pertains. The disclosure of each of these publications and documents is incorporated by reference herein.
The ability to acquire and analyse DNA sequence data has increased phenomenally over the past few years. As a result nucleic acid analysis has become increasingly important in many areas of biology, biotechnology and medicine. Molecular biology and pharmaceutical drug development now make intensive use of nucleic acid analysis. The most challenging areas are whole genome sequencing, single nucleotide polymorphism detection, screening and gene expression monitoring, which typically require generation and analysis of large amounts of nucleic acid sequence data.
One area of technology which revolutionised the study of nucleic acids was the development of nucleic acid amplification techniques, such as the polymerase chain reaction (PCR). Amplification reactions, such as PCR, can enable the user to specifically and selectively amplify a particular target nucleic acid of interest from a complex mixture of nucleic acids. However, there is also an ongoing need for nucleic acid amplification techniques which enable simultaneous amplification of complex mixtures of templates of diverse sequence, such as genomic DNA fragments (e.g. ‘whole genome’ amplification) or cDNA libraries, in a single amplification reaction.
PCR amplification cannot occur in the absence of annealing of forward and reverse amplification primers to primer binding sequences in the template to be amplified under the conditions of the annealing steps of the PCR reaction, i.e. if there is insufficient complementarity between primers and template. Some prior knowledge of the sequence of the template is therefore required before one can carry out a PCR reaction to amplify a specific template, unless random primers are used with a consequential loss of specificity. The user must usually know the sequence of at least the primer-binding sites in the template in advance so that appropriate primers can be designed, although the remaining sequence of the template may be unknown. The need for prior knowledge of the sequence of the template increases the complexity and cost of PCR amplification of complex mixtures of templates, such as genomic DNA fragments.
Several of the new methods employed for high throughput DNA sequencing (Nature. 437, 376-380 (2005); Science. 309, 5741, 1728-1732 (2005)) rely on a universal amplification reaction, whereby a DNA sample is randomly fragmented, then treated such the ends of the different fragments all contain the same DNA sequence. Fragments with universal ends can be amplified in a single reaction with a single pair of amplification primers. Separation of the library of fragments to the single molecule level prior to amplification ensures that the amplified molecules form discrete populations, that can then be further analysed. Such separations can be performed either in emulsions (Nature. 437, 376-380 (2005); Science. 309, 5741, 1728-1732 (2005)), or on a surface (Nucleic Acids Research 27, e34 (1999); Nucleic Acids Research 15, e87 (2000)).
WO 98/44151 and WO 00/18957 both describe methods of forming polynucleotide arrays based on ‘solid-phase’ nucleic acid amplification, which is a bridging amplification reaction wherein the amplification products are immobilised on a solid support in order to form arrays comprised of nucleic acid clusters or ‘colonies’. Each cluster or colony on such an array is formed from a plurality of identical immobilised polynucleotide strands and a plurality of identical immobilised complementary polynucleotide strands. The arrays so-formed are generally referred to herein as ‘clustered arrays’ and their general features will be further understood by reference to WO 98/44151 or WO 00/18957, the contents of both documents being incorporated herein in their entirety by reference.
In common with all amplification techniques, solid-phase bridging amplification requires the use of forward and reverse amplification primers which include ‘template-specific’ nucleotide sequences which are capable of annealing to sequences in the template to be amplified, or the complement thereof, under the conditions of the annealing steps of the amplification reaction. The sequences in the template to which the primers anneal under conditions of the amplification reaction may be referred to herein as ‘primer-binding’ sequences.
Certain embodiments of the methods described in WO 98/44151 and WO 00/18957 make use of ‘universal’ primers to amplify templates comprising a variable template portion that it is desired to amplify flanked 5′ and 3′ by common or ‘universal’ primer binding sequences. The ‘universal’ forward and reverse primers include sequences capable of annealing to the ‘universal’ primer binding sequences in the template construct. The variable template portion, or ‘target’ may itself be of known, unknown or partially known sequence. This approach has the advantage that it is not necessary to design a specific pair of primers for each target sequence to be amplified; the same primers can be used for amplification of different templates provided that each template is modified by addition of the same universal primer-binding sequences to its 5′ and 3′ ends. The variable target sequence can therefore be any DNA fragment of interest. An analogous approach can be used to amplify a mixture of templates (targets with known ends), such as a plurality or library of target nucleic acid molecules (e.g. genomic DNA fragments), using a single pair of universal forward and reverse primers, provided that each template molecule in the mixture is modified by the addition of the same universal primer-binding sequences.
Such ‘universal primer’ approaches to PCR amplification, and in particular solid-phase bridging amplification, are advantageous since they enable multiple template molecules of the same or different, known or unknown sequence to be amplified in a single amplification reaction, which may be carried out on a solid support bearing a single pair of ‘universal’ primers. Simultaneous amplification of a mixture of templates of different sequences would otherwise require a plurality of primer pairs, each pair being complementary to each unique template in the mixture. The generation of a plurality of primer pairs for each individual template is not a viable option for complex mixtures of templates.
The addition of universal priming sequences onto the ends of targets to be amplified by PCR can be achieved by a variety of methods known to those skilled in the art. For example, a universal primer consisting of a universal sequence at its 5′ end and a degenerate sequence at its 3′ end can be used in a PCR (DOP-PCR, eg PNAS 1996 vol 93 pg 14676-14679) to amplify fragments randomly from a complex target sequence or a complex mixture of target sequences. The degenerate 3′ portion of the primer anneals at random positions on DNA and can be extended to generate a copy of the target that has the universal sequence at its 5′ end.
Alternatively, adaptors that contain universal priming sequences can be ligated onto the ends of the target sequences. The adaptors may be single-stranded or double-stranded. If double-stranded, they may have overhanging ends that are complementary to overhanging ends on the target molecules that may have been generated with a restriction endonuclease, or added with a DNA polymerase or terminal transferase. Alternatively, the double-stranded adaptors may be blunt, in which case the targets are also blunt ended. The blunt ends of the targets may have been formed during a process to shear the DNA into fragments, or they may have been formed by an end repair reaction, as would be well known to those skilled in the art.
A single adaptor or two different adaptors may be used in a ligation reaction with target sequences. If a target has been manipulated such that its ends are the same, i.e. both are blunt or both have the same overhang, then ligation of a single compatible adaptor will generate a template with that adaptor on both ends. However, if two compatible adaptors, adaptor A and adaptor B, are used, then three permutations of ligated products are formed: template with adaptor A on both ends, template with adaptor B on both ends, and template with adaptor A on one end and adaptor B on the other end. This last product is, under some circumstances, the only desired product from the ligation reaction and consequently additional purification steps are necessary following the ligation reaction to purify it from the ligation products that have the same adaptor at both ends.
A major drawback in preparing nucleic acid fragment libraries by ligating adaptors to the ends of template nucleic acid fragments is the formation of adaptor-dimers. Adaptor-dimers are formed by the ligation of two adaptors directly to each other such that they do not contain a template nucleic acid fragment as an insert. Such molecules are undesirable, in that during any amplification steps, for example during a universal amplification reaction, adaptor-dimers are amplified alongside the nucleic acid fragment library. Since adaptor-dimers are generally smaller than the fragments contained in the libraries they amplify and accumulate at a faster rate. This reduces the efficiency of the amplification reaction by limiting amplification of the library fragments by depletion of components, such as for example dNTP's and primers, in the amplification reaction. Another more serious concern that when such amplified fragments are sequenced they do not give useful sequence information since they contain no insert. In the case of clustered arrays, a significant population of clusters that have no target DNA sequence is undesirable due to the lower density of real sequence data obtained from a chip of finite size. Hence the efficiency of sequencing can be significantly reduced. Thus, the preparation of libraries with a low level of adaptor-dimers is highly advantageous in the sequencing of polynucleotides, particularly when such processes are high-throughput.
The invention presented herein is directed to a method of generating a library of template polynucleotides using a single adaptor construct in a ligation reaction which reduces and/or prevents the formation of adaptor-dimers. The method can be applied to preparing simple or complex populations of templates for amplification, for example on a solid surface, using primer sequences, with no prior knowledge of the target sequences. The invention is applicable to the preparation of templates from complex samples such as whole genomes or mixtures of cDNAs, as well as mono-template applications.