The present invention relates generally to genetic analysis and, in particular, to genomics. More specifically, the invention relates to a method of amplification of a nucleic acid (or a nucleotide sequence of interest) which may be applied to sequencing in general and, in particular, to genome sequencing.
The development of methods for automated DNA sequence analysis, together with advances in bioinformatics, has revolutionised biology and medicine and ushered in the new field of genomicsxe2x80x94the study of genes and genomes. These techniques have been used to decipher the entire genomes of a number of bacteria (5, 7, 9, 10, 20), archea (3) and eukaryotes (6, 11).
The traditional approach to sequencing large genomes, including the human genome, uses a three-stage divide-and-conquer strategy (29). The first stage involves the construction of a number of clone libraries of the study organism""s DNA by randomly cutting the DNA into fragments, separating these into differing size classes, and then inserting the fragments into appropriate vectors capable of propagation in a yeast or bacterial host.
The second stage involves (a) construction of a low-resolution physical map by identification of shared chromosomal landmarks on overlapping yeast or bacterial artificial chromosome (YAC or BAC) clones. The landmarks may be, for example, unique sites that can be amplified by polymerase chain reaction (PCR) (sequence-tagged sites or STSs) or restriction-enzyme digestion sites: (b) the construction of high-resolution (sequence ready) maps by randomly subcloning YAC or BAC inserts into cosmid vectors and identifying their landmark overlaps.
The third and final stage involves selecting a minimally overlapping set of cosmid clones, randomly fragmenting each into small pieces, and subcloning into M13 phage or plasmid vectors. For each cosmid approximately 800 M13 phase clones are sequenced and assembled to construct the sequence of the 40-kilobase (Kbp) cosmid insert. This random (shogun) approach is redundant as ever nucleotide is sequenced about eight times.
The complexity and cost of the xe2x80x9cdivide-and-conquerxe2x80x9d approach has driven the development of new strategies. The Institute for Genomic Research (TIGR) has pioneered the direct shogun sequencing of megabase-sized (Mbp) genomes. In this approach, the small fragments of chromosomal DNA are cloned directly into the M13 vector. Clones are randomly sequenced and the chromosome sequence constructed by direct assembly. This whole-genome random sequencing strategy has been applied to the sequencing of a number of bacterial and archeal genomes, including the 1.9 Mbp genome of Haemophilus influenzae (9), the 0.58 Mbp genome of Mycoplasma genitalium (10), and the 1.66 Mbp genome of Methanococcus jannaschii (3). This approach eliminates the need for any prior physical mapping, significantly reducing the overall per base pair cost of producing a finished sequence. However, as with all random sequencing approaches, the inherent problem is the requirement for a high level of sequence redundancy. In other words, every nucleotide has to be sequenced numerous times until, by computer alignment, sequence contigs (clusters of aligned sequences) can be constructed. The initial shotgun assembly of the H. influenzas genome, for example, involved the generation of 11.6 Mbp of random sequence data (greater than 6-fold genome coverage), and yet still contained 140 contig gaps requiring labour intensive closure (9).
An alternative to the inherent inefficiencies of random shogun sequencing is primer walking (25). In this procedure, a primer designed from a known sequence is used to extend sequence information into the flanking unknown region. The new sequence information is used to design the next primer, and the process is continued until the entire sequence of the region of interest is determined. Although the primer walking strategy appears attractive for large-scale sequencing projects, the need for time-consuming and expensive synthesis of individual primers every 400 to 500 bp makes it impracticable. The use of a presynthesized library of short primers would avoid the requirement for the synthesis of each new primer. Unfortunately, libraries of even relatively short primers are enormous, for example, a complete octamer library contains 65.536 primers, while a complete decamer library contains over a million individual primers.
Two basic solutions have been proposed to enable primer walking and yet avoid the synthesis of large primer libraries. The first involves reducing the size of the primer libraries by selecting an optimise subsets of useful octamers, nonamers, or decamers (4, 12, 24, 26). The second, Sequential Primer Elongation by Ligation of 6-mers (SPEL-6), involves the assembly of large primers (18 bp or longer) by the annealing of at least three contiguous complementary hexamers (drawn from a presynthesized library of the full set of all 4096 hexamers or 1024 singly degenerated hexamers) to a single stranded DNA template. The annealed hexamers are joined by libation and a standard sequencing reaction performed (15-19, 27). A number of related techniques based on this approach have been developed, including the use of hexamers but omitting ligation (21, 22), or based on the ligation of self-complementary hexamer strings (8).
A large number of technical difficulties exist with both approaches which has prevented their wide-spread use. Simulation studies of large sequencing projects have suggested that reduction of primer sets by more than 80% to 90% affects priming flexibility and general utility (1, 26). In the case of an octamer primer library, this results in library sets containing 6,000 to 12,000 primers, with a nonamer primer library requiring four times as many primers. While primer libraries of this size are technically possible, they would be both expensive to construct and unwieldy to use. A number of investigators have designed smaller octamer and nonamer primer sets containing 1000 to 3000 primers, however, these sets are limited in use to protein coding sequences with little G-C variability (12, 14, 24). Of a more fundamental nature is the failure of many short oligonucleotides to successfully prime sequencing reactions, for example, in one report only approximately one half of 121 nonamer primers worked (2). This common problem appears linked to the formation of template secondary structures which prevent efficient binding of the primer to the correct site (18).
The complexity of the SPEL-6 hexamer libation strategy has limited its utility for large-scale sequencing projects. In addition to a complete hexamer primer library (containing 4096 primers), this technique requires: (1) enzymatic phosphorylation of the hexamer primers, (2) a single-stranded DNA template or chemical denaturation of double stranded DNA, (3) a DNA ligation reaction in the presence of single stranded binding protein, (4) a deproteination step before sequencing, and (5) the use of the Sequence enzyme (18). In addition, sequencing failures are common, as the low annealing temperature required for hexamer primer annealing also promotes the formation of template secondary hairpin structures that prevent efficient primer annealing. Finally, both the reduced library and the SPEL-6 approaches are unable to use fluorescent-labelled primers, and are thus limited in the use of sequencing hardware and chemistries.
It is an object of the present invention to overcome or ameliorate at least one of the disadvantages of the prior art, or to provide a useful alternative.
The present invention relates to a method of amplification of a nucleotide sequence utilising interlaced nesting primers. The method may be used for sequencing purposes and, in particular, for genome sequencing. The method when applied to sequencing has been coined Amplification and Sequencing by Interlaced Nesting (ASIN).
According to a first aspect, the present invention provides a method of amplifying a nucleotide sequence of interest wherein the nucleotide sequence comprises at least one region of known sequence, wherein the region of known sequence comprises, 3xe2x80x2 to 5xe2x80x2: a first known region and a second known region wherein the first and second known regions are immediately adjacent each other and wherein the method comprises:
1) a first amplification step comprising at least
(a) as a template, a sequence comprising at least the nucleotide sequence of interest; and
(b) one first primer having, 5xe2x80x2 to 3xe2x80x2: a 5xe2x80x2 tag sequence; a degenerate sequence corresponding to the first known region; and a sequence complementary to the second known region of the nucleotide of interest;
which amplification step generates a first amplification product; and
2) a second amplification step comprising at least
(a) as a template, the first amplification product; and
(b) one second primer having, 5xe2x80x2 to 3xe2x80x2: a sequence which hybridises to the complementary strand of the 5xe2x80x2 tag sequence of the first primer and a sequence complementary to the first known region of the nucleotide sequence of interest; which amplification step generates a second amplification product.
Preferably, the amplification step is a polymerase chain reaction (PCR).
Preferably, in the first and second primers respectively, the 5xe2x80x2 tag sequence and the sequence which hybridises to the complementary strand of the 5xe2x80x2 tag sequence of the first primer, are the same.
Preferably, the 5xe2x80x2 tag sequence is a vector specific sequence. More preferably, the 5xe2x80x2 tag sequence is derived from M13 phage, pUC18, pBR322, pGEM(copyright), BLUESCRIPT, or pBELOBACII. However, the skilled addressee will, of course, understand that any appropriate 5xe2x80x2 tag sequence can be used.
Preferably, the 5xe2x80x2 tag sequence is 10 to 25 bases in length. More preferably, the 5xe2x80x2 tag sequence is 15 to 18 bases in length. Most preferably, the 5xe2x80x2 tag sequence is 15 bases in length.
Preferably, in the first primer, the sequence complementary to the second known region is 4 to 8 bases in length. More preferably, in the first primer, the sequence complementary to the second known region is 6 bases in length.
Preferably, in the first primer, the degenerate sequence together with the sequence complementary to the second known region of the nucleotide of interest is 6 to 12 bases in length. Most preferably, in the first primer, the degenerate sequence together with the sequence complementary to the second known region of the nucleotide of interest is 9 bases in length.
Preferably, in the second primer, the sequence complementary to the first known region of the nucleotide sequence of interest is 1 to 5 bases in length. More preferably, in the second primer, the sequence complementary to the first known region of the nucleotide sequence of interest is 3 bases in length.
Preferably, the 5xe2x80x2 tag sequence, or a sequence which hybridises to the complementary sequence of the 5xe2x80x2 tag sequence, is used as a sequencing primer. Alternatively, the second primer is used as a sequencing primer.
In one embodiment, the first and second amplifications comprise two first primers and two second primers respectively, wherein one of the first primers acts as a reverse primer in the first amplification step and one of the second primers acts as a reverse primer in the second amplification step. It will be clear to the skilled addressee that the 5xe2x80x2 tag sequence on each of the first primers may be the same or different.
It will also be clear to the skilled addressee that steps 1) and 2) may be performed sequentially or simultaneously.
According to a third aspect, the present invention provides a product when obtained by a method according to the first or second aspect.
According to a fourth aspect, the present invention provides a primer having, 5xe2x80x2 to 3xe2x80x2: a 5xe2x80x2 tag sequence: a degenerate sequence: and a predetermined sequence, wherein the sequences are immediately adjacent to each other. Preferably, the predetermined sequence is 4 to 8 bases in length and more preferably it is 6 bases in length. Preferably, the degenerate sequence together with the predetermined sequence is 6 to 12 bases in length and more preferably, it is 9 bases in length.
According to a fifth aspect, the present invention provides a primer having, 5xe2x80x2 to 3xe2x80x2: a sequence which hybridises to the complementary strand of the 5xe2x80x2 tag sequence of a primer according to the fourth aspect, and a predetermined sequence. Preferably, the predetermined sequence is 1 to 5 bases in length and more preferably it is 3 bases in length.
The 5xe2x80x2 tag sequence of the primers of the fourth and fifth aspects may be a vector specific sequence such as that derived from M13 phage, pUC18, pBR322, pGEM, BLUESCRIPT, or pBELOBACII. Preferably the primers of the fourth and fifth aspects are 10 to 25 bases in length, more preferably 15 to 18 bases in length and most preferably 15 bases in length.
According to a sixth aspect, the present invention provides a primer set comprising at least two primers wherein the primers are selected from primers according to the fourth or fifth aspect. The primer set may be a primer set in which at least one primer is a primer according to the fourth aspect and at least one primer is a primer according to the fifth aspect.
According to a seventh aspect, the present invention provides a primer library comprising primers according to the fourth aspect.
According to an eighth aspect, the present invention provides a primer library comprising primers according to the fifth aspect.
According to a ninth aspect, the present invention provides a primer library comprising primers according to the fourth or fifth aspect.
According to a tenth aspect, the present invention provides a primer library comprising primer sets according to the sixth aspect.
The libraries may be produced by any means known to persons skilled in the art. Preferably, the libraries are produced by chemical synthesis using the phosphoramidite method.
According to an eleventh aspect, the present invention provides a kit comprising a primer according to the fourth or fifth aspect.
According to a twelfth aspect, the present invention provides a kit comprising a primer set according to the sixth aspect.
According to a thirteenth aspect, the present invention provides a kit comprising a primer library according to any one of the seventh to tenth aspects.
Preferably, the kit is used in the method of the first or second aspect. It will be appreciated that the kit may further include optional buffers, diluents, enzymes, and a sample of one or more suitable reverse primers.
According to a fourteenth aspect, the present invention provides a method of amplifying and sequencing a nucleotide sequence of interest wherein the nucleotide sequence comprises at least one region of known sequence, wherein the region of known sequence comprises, 3xe2x80x2 to 5xe2x80x2: a first known region of 3 bases and a second known region of 6 bases wherein the first and second known regions are immediately adjacent each other and wherein the method comprises:
1) a first amplification step comprising at least
(a) as a template, a sequence comprising at least the nucleotide sequence of interest; and
(b) one first primer having, 5xe2x80x2 to 3xe2x80x2: a 5xe2x80x2 tag sequence; a degenerate sequence corresponding to the first known region; and a sequence complementary to the second known region of the nucleotide of interest;
which amplification step generates a first amplification product; and
2) a second amplification step comprising at least
(a) as a template, the first amplification product; and
(b) one second primer having, 5xe2x80x2 to 3xe2x80x2: the same 5xe2x80x2 tag sequence as the first primer and a sequence complementary to the first known region of the nucleotide sequence of interest
which amplification step generates a second amplification product;
3) sequencing the second amplification product using the 5xe2x80x2 tag sequence as a sequencing primer.
According to a fifteenth aspect, the present invention provides a method of amplifying and sequencing a nucleotide sequence of interest wherein the nucleotide sequence comprises at least one region of known sequence, wherein the region of known sequence comprises, 3xe2x80x2 to 5xe2x80x2: a first known region and a second known region wherein the first and second known regions are immediately adjacent each other and wherein the method comprises:
1) an amplification step comprising at least
(a) as a template, a sequence comprising at least the nucleotide sequence of interest; and
(b) one primer having, 5xe2x80x2 to 3xe2x80x2: a 5xe2x80x2 tag sequence; a degenerate sequence corresponding to the first known region; and a sequence complementary to the second known region of the nucleotide of interest;
which amplification step generates an amplification product; and
2) a sequencing step in which the amplification product is sequenced using a sequencing primer.
Preferably, the sequencing step of the fifteenth aspect comprises at least
(a) as a template, the amplification product; and
(b) the sequencing primer having, 5xe2x80x2 to 3xe2x80x2: a sequence which hybridises to the complementary strand of the 5xe2x80x2 tag sequence of the primer utilised in the amplification step, and a sequence complementary to the first known region of the nucleotide of interest.
According to a sixteenth aspect, the present invention provides a method of amplifying and sequencing a nucleotide sequence of interest wherein the nucleotide sequence comprises at least one region of known sequence, wherein the region of known sequence comprises, 5xe2x80x2 to 3xe2x80x2: a first known region and a second known region wherein the first and second known regions are immediately adjacent each other and wherein the method comprises:
1) an amplification step comprising at least
(a) as a template, a sequence comprising at least the nucleotide sequence of interest; and
(b) one primer having, 5xe2x80x2 to 3xe2x80x2: a 5xe2x80x2 tag sequence: a degenerate sequence corresponding to the first known region: and a sequence complementary to the second known region of the nucleotide of interest;
which amplification step generates an amplification product;
2) a sequencing step in which the amplification product is sequenced using a sequencing primer.
Preferably, the sequencing step of the sixteenth aspect comprises at least
(a) as a template, the amplification product; and
(b) the sequencing primer having, 5xe2x80x2 to 3xe2x80x2: a sequence which hybridises to the complementary strand of the 5xe2x80x2 tag sequence of the primer utilised in the amplification step, and a sequence complementary to the first known region of the nucleotide of interest.
Preferably, the amplification step of the sixteenth aspect is a polymerase chain reaction. In the primer utilised in the amplification step and in the sequencing primers respectively, the 5xe2x80x2 tag sequence and the sequence which hybridises to the complementary strand of the 5xe2x80x2 tag sequence of the primer utilised in the amplification step, may be the same or different. The 5xe2x80x2 tag, sequence may be a vector specific sequence such as M13 phage, pUC18, pBR322, pGEM, BLUESCRIPT, or pBELOBACII. Preferably, the 5xe2x80x2 tag sequence is 10 to 25 bases in length, more preferably 15 to 18 bases in length and most preferably 15 bases in length. In the method of the sixteenth aspect, in the primer utilised in the amplification step, the sequence complementary to the second known region may be 4 to 8 bases in length and more preferably, 6 bases in length. In the primer utilised in the amplification step, the degenerate sequence together with the sequence complementary to the second known region of the nucleotide of interest is 6 to 12 bases in length, more preferably 9 bases in length. In the sequencing primer, the sequence complementary to the first known region of the nucleotide sequence of interest may be 1 to 5 bases in length, more preferably 3 bases in length.
It will be clear to the skilled addressee that, since the primers of the invention can be used as forward and reverse primers, the methods described above can be used to amplify (and sequence) any desired nucleotide sequence. It is also clear that since large parts of the genome sequence of some organisms, and the entire genome sequence of others is known, the invention could be used to generate a library or an array of sequences of, say, open reading frames of an organism.
According to a seventeenth aspect, the present invention provides a product produced by a method according to any one the fourteenth to the seventeenth aspects.
According to an eighteenth aspect, the present invention provides a kit comprising a product of the third or eighteenth aspects.
According to a nineteenth aspect, the present invention provides a method according to any one the first or second aspects or the fourteenth to seventeenth aspects when used in a method of primer walking.
According to a twentieth aspect, the present invention provides a product according to the third aspect, when used in a method of primer walking.
According to a twenty-first aspect, the present invention provides a primer according to the fourth or fifth aspect, when used in a method of primer walking.
According to a twenty-second aspect, the present invention provides a primer set according to the sixth aspect, when used in a method of primer walking.
According to a twenty-third aspect, the present invention provides a primer library according to any one of the seventh to tenth aspects, when used in a method of primer walking.
According to a twenty-fourth aspect, the present invention provides a kit according to any one of the eleventh to thirteenth aspects, when used in a method of primer walking.
General laboratory procedures not specifically described in this specification can be found in the general molecular biology texts including, for example, Sambrook et al. (1989) Molecular Cloning: A laboratory Manual. Cold Spring Harbor Laboratory:Cold Spring Harbor, N.Y.
In the context of the present specification the terms xe2x80x9cpolymerase chain reactionxe2x80x9d and its acronym xe2x80x9cPCRxe2x80x9d are used according to their ordinary meaning as understood by those skilled in the art. Examples of PCR methods can be found in common molecular biology textbooks and reference manuals used in the art. For example PCR Technology: Principles and Applications for DNA Amplification (1989) Ed. H. A. Erlich Stockton Press, New York.
In order to optimise the PCR amplification, the primers can be used at different concentrations and ratios. Selection of these and other variables would be appreciated and obtainable by persons skilled in the art.
In the context of the present invention, the terms xe2x80x9cto amplifyxe2x80x9d and xe2x80x9camplificationxe2x80x9d should be construed in the sense of xe2x80x9cto produce at least one copy ofxe2x80x9d and xe2x80x9cthe production of at least one copy ofxe2x80x9d.
In the context of the present invention, the term xe2x80x9csequencing primerxe2x80x9d should be construed in the sense of any primer which can initiate a sequencing reaction.
In the context of the present invention, the terms xe2x80x9coligonucleotidexe2x80x9d, xe2x80x9cnucleic acidxe2x80x9d, xe2x80x9cnucleotide sequencexe2x80x9d and xe2x80x9ctemplatexe2x80x9d should be construed according to their ordinary meaning as understood by the skilled addressee.
In the context of the present specification, the term xe2x80x9cprimer walkingxe2x80x9d should be construed according to its ordinary meaning as understood by those skilled in the art, i.e. in the sense of utilising a known nucleotide sequence to obtain nucleotide sequence from an unknown flanking region and utilising the nucleotide sequence thus obtained to obtain further nucleotide sequence from a further flanking region. The process can be repeated any number of times.
Unless the context clearly requires otherwise, throughout the description and the claims, the words xe2x80x9ccomprisexe2x80x9d, xe2x80x9ccomprisingxe2x80x9d, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense: that is to say, in the sense of xe2x80x9cincluding, but not limited toxe2x80x9d.