Several publications and patent documents are referenced in this application in order to more fully describe the state of the art to which this invention pertains. The disclosure of each of these publications and documents is incorporated by reference herein.
The Polymerase Chain Reaction or PCR (Saiki et al 1985, Science 230:1350) has become a standard molecular biology technique which allows for amplification of nucleic acid molecules. This in-vitro method is a powerful tool both for the detection and analysis of small quantities of nucleic acids and other recombinant nucleic acid technologies.
Briefly, PCR typically utilizes a number of components: a target nucleic acid molecule, a molar excess of a forward and reverse primer which bind to the target nucleic acid molecule, deoxyribonucleoside triphosphates (dATP, dTTP, dCTP and dGTP) and a polymerase enzyme.
The PCR reaction is a DNA synthesis reaction that depends on the extension of the forward and reverse primers annealed to opposite strands of a dsDNA template that has been denatured (melted apart) at high temperature (90° C. to 100° C.) Using repeated melting, annealing and extension steps usually carried out at differing temperatures, copies of the original template DNA are generated.
Amplification of template sequences by PCR typically draws on knowledge of the template sequence to be amplified such that primers can be specifically annealed to the template. The use of multiple different primer pairs to simultaneously amplify different regions of the sample is known as multiplex PCR, and suffers from numerous limitations, including high levels of primer dimerisation, and the loss of sample representation due to the different amplification efficiencies of the different regions.
For the multiplex analysis of large numbers of target fragments, it is often desirable to perform a simultaneous amplification reaction for all the targets in the mixture, using a single pair of primers for all the targets. In certain embodiments, one or more of the primers may be immobilised on a solid support. Such universal amplification reactions are described more fully in application US2005/0100900 (Method of Nucleic Acid Amplification), the contents of which are incorporated herein by reference in their entirety. Isothermal amplification methods for nucleic acid amplification are described in US2008/0009420, the contents of which are incorporated herein by reference in their entirety. The methods involved may rely on the attachment of universal adapter regions, which allows amplification of all nucleic acid templates from a single pair of primers. However the universal amplification reaction can still suffer from limitations in amplification efficiency related to the sequences of the templates. One manifestation of this limitation is that the mass or size of different nucleic acid clusters varies in a sequence dependent manner. For example, the AT rich clusters can gain more mass or become larger than the GC rich clusters. As a result, analysis of different clusters may lead to bias. For example, in applications where clusters are analyzed using sequencing by synthesis techniques the GC rich clusters may appear smaller or more dim such that the clusters are detected less efficiently. This results in lower representation of sequence data for GC rich clusters than the brighter (more intense) and larger AT rich clusters. This can result in lower representation and less accurate sequence determination for the GC rich templates, an effect which may be termed GC bias. The presence of sequence specific bias during amplification gives rise to difficulties determining the sequence of certain regions of the genome, for example GC rich regions such as CpG islands in promoter regions. The resulting lack of sequence representation in the data from clusters of different GC composition translates into data analysis problems such as increases in the number of gaps in the analyzed sequence; a yield of shorter contigs, giving rise to a lower quality de novo assembly; and a need for increased coverage to sequence a genome, thereby increasing the cost of sequencing genomes.
In particular embodiments, the methods and compositions presented herein are aimed at limiting the sequence specific biases found in nucleic acid amplification reactions. In certain embodiments, the methods of amplification normalise the intensity of nucleic acid clusters of different sequences, and minimise the population size variance between amplified nucleic acid species having different sequences. The problem of bias may be more acute when the density of clusters on the solid support is high. In certain situations, as the clusters grow, the amplification primers on the solid support are all extended, and hence adjacent clusters can not expand over the top of each other due to the lack of available amplification primers. The over-amplification of AT rich sequences causes rapid consumption of the primers on the surface, and hence reduces the ability of the GC rich sequences to amplify. The amplification methods described herein are therefore particularly useful in order to obtain a high cluster density on a solid support where different clusters contain AT and GC rich sequences.