A full bibliographic citation of the references cited in this application can be found in the section preceding the claims.
The invention relates to molecular biology methods. In particular, the invention relates to nucleic acid sequencing methods.
Billions of DNA bases must be sequenced to meet the goals of the Human Genome Program. Technology must advance so that the amount of bases determined per unit of time is significantly increased, the quality of the data is highly accurate, and the cost per base is significantly decreased. Such technological advancements would enhance large sequencing projects, such as the Human Genome Project, and would benefit other types of research such as discovering and genotyping single nucleotide polymorphisms (SNPs) and gene-based drug discovery.
The current approach used in most large-scale sequencing projects is that of random sequencing of cloned shot-gun DNA fragments. In this procedure, randomly cut, overlapping nucleic acid fragments are cloned to form a library of random clones. These are sequenced. Sequence data from the library is aligned to form contiguous sequences (contigs). An 8-10 fold coverage is required to obtain sufficient overlap matching to obtain a contig. The gaps between the contigs are then filled in using primer-walking. Obtaining the gap sequences (sequences which constitute only the final few percent of the total desired sequence) requires a disproportionate effort compared to the number of nucleotides sequenced within the gaps.
Instead of using shot-gun clones, it would be very advantageous to develop a high-throughput, primer-based DNA sequencing strategy that uses primers selected from a pre-synthesized primer library. Conventional primer-based DNA sequencing requires the synthesis of a vast number of full-length primers for implementing a full-fledged primer-walking procedure. For example, conventional primer walking using 16 base long primers requires the synthesis of 416 primers. If a library of shorter primers could be used for this purpose, it would greatly reduce the number of primers needed for primer walking.
In 1989, Studier proposed a strategy for high-volume sequencing of cosmid DNAs using a primer library composed of 8-, 9-, or 10-mers. Others have proposed synthesizing a library containing a subset of useful octamers or nonamers (Slemieniak and Slightom, 1990; Burbelo and Iadarola, 1994). The use of ligated or non-ligated pentamer/hexamer strings has also been proposed (Kaczorowski and Szybalski, 1994; Kieleczawa, et al., 1992). A reduced library of selected nonamers has also been proposed (Siemieniak and Slightom, 1990). Several reports have demonstrated limited success with using short primer strings to prime fluorescence-based sequence reactions (Hon and Smith, 1994; Kolter, L., et al., 1994; McCombie and Kieleczawa, 1994) Bock and Slightom (1995) reported fluorescence-based cycle-sequencing with primers selected from a nonamer library. With the xe2x80x9cPRISMxe2x80x9d-brand T7 DNA polymerase, a commercial kit available from Perkin Elmer/Applied Biosystems, Inc. (PE/ABI) (Foster City, Calif.), Bock and Slightom reported a complete lack of success. Although reasonable results were obtained using standard oligomers (21-mers), no sequence information was generated with nonamer primers (using the same template DNA) even after testing several different template and nonamer concentrations. Bock and Slightom used the PE/ABI cycling sequencing procedure, which gave some weak results. However, even after optimizing reaction conditions for sequencing to suit the nonamers, this procedure had a success rate of only about 50%. The modified PE/ABI cycle-sequencing procedure contained some very unusual steps. For example, the use of linear and pre-denatured plasmid DNA was a must even for this low success rate. Other peculiarities associated with the procedure included the use of a low annealing temperature (20xc2x0 C. for 5 min) followed by a 5-min ramp to the 60xc2x0 C. extension temperature and the use of 50 cycles. According to the authors themselves, this level of success is somewhat disappointing, as they have only partially satisfied the goal of a primer library-based DNA sequencing strategy. Thus, additional improvements are needed before such a strategy can be considered practical for large-scale genome-type sequencing.
In addition to the nonamer-based cycle-sequencing method, both (1) Hardin, et al., (1996) and (2) Jones and Hardin (1998) made efforts at carrying out octamer-primed cycle-sequencing. However, as in the case of the nonamer, this is not effective for large-scale sequencing. When octamers from a 50% GC library were assayed, only five out of fourteen primers produced sequence information, resulting in an unacceptable 35.7% reaction success rate. Optimized conditions had to be used for sequencing a particular DNA template, and a set of optimized, 75% GC library had to be selected, which gave a success rate of xcx9c73%. For this success rate, a low annealing temperature of 40xc2x0 C. had to be employed, and the reaction had to be cycled for 99 rounds (instead of the usual 30 cycles). Ball, et al., (1998) have extended the use of octamer primer by tailing the primers with modified bases. The authors used, among other modified bases, 5-nitroindole in a tail, which was expected to stabilize the primers while behaving indiscriminately in base-pairing. Although this process improves the signal intensity, there were limitations. For example, only a maximum of four 5-nitroindole residues could be added. Longer tails ( greater than 6 residues) were detrimental, as they loop back on themselves, destabilizing the primer. Additionally, longer runs of 5-nitroindole residues can form secondary structures. The optimum length for the 5-nitroindole tail is 3-4 residues. This study also showed that a considerable percentage of cases required the addition of a tail to an octamer for obtaining any sequence data. A very low annealing temperature of 30xc2x0 C. had to be used.
While these studies indicated that shorter oligonucleotides such as nonamer or octamer could be used for sequencing for some situations, it is clear that these approaches have severe limitations. It will be very advantageous to developing a method by which considerably longer oligonucleotides can be provided as primers, and yet the ease of availability of primers is not compromised. What is needed is a method using longer, full-length primers for cycle-sequencing when little or no sequence information of template DNA is available. What is also needed is a method using the longer, full-length primers in combination with both (1) shot-gun sequencing for obtaining the majority of the sequence and (2) primer walking for closing the gaps. This method should avoid random fragmenting and sub-cloning the DNA and avoid the need for preparing new full-length primers.
The present invention utilizes primers in which a region of the primer sequence is fixed, and, in the preferred embodiment, the remainder of the primer sequence is randomized, thereby providing an array of all the possible sequences. Accordingly, a full-length primer species will be available to bind to a particular sequence in the template DNA.
It is a principal aim of the present invention to provide a method for sequencing a long DNA molecule without fragmenting or sub-cloning the long DNA molecule.
It is a further aim of the present invention to provide a method for PCR amplifying a DNA fragment with a long-fixed sequence degenerate primer and a short-fixed sequence degenerate primer.
Yet a further aim of the present invention is to provide a method for sequencing a long DNA molecule with a primer having an arbitrary sequence handle. The handle improves the sequencing reaction.
Yet a further aim of the present invention is to provide a method for amplifying a long DNA molecule with a primer having an arbitrary sequence handle. The handle improves the amplification reaction.
The invention is directed to a method of sequencing a nucleic acid template. The method comprises the steps of: (a) providing a plurality of first primers, each first primer comprising (i) a region of fixed nucleotide sequence and (ii) a region of randomized nucleotide sequence located 5xe2x80x2 to, 3xe2x80x2 to, flanking, or interspersed within the region of fixed nucleotide sequence; and then (b) annealing the first plurality to a nucleic acid template, wherein at least one primer anneals to the template. The annealed first primer is then (c) extended with a mixture of dNTPs and ddNTPs to generate a series of nucleic acid fragments. The nucleotide sequence of a first region of the template is then (d) determined from the series of nucleic acid fragments.
In the preferred embodiment, the invention further comprises the steps of providing a plurality of second primers, each second primer also comprising (i) a region of fixed nucleotide sequence and (ii) a region of random nucleotide sequence located 5xe2x80x2 to, 3xe2x80x2 to, flanking, or within the region of fixed nucleotide sequence. Steps (b)-(d), above, are then repeated for the second plurality of primers to thereby determine the nucleotide sequence of a second region of the template. The first sequenced region and the second sequenced region of the template nucleic acid are then assembled to form a first contig. These steps can then be repeated ad infinitum to form additional contigs.
Sequence gaps between contigs can the be determined by providing a plurality of third primers, each third primer comprising (i) a region of fixed nucleotide sequence and (ii) a region of random nucleotide sequence located 5xe2x80x2 to, 3xe2x80x2 to, flanking, or within the region of fixed nucleotide sequence and annealing the plurality of third primers to the nucleic acid template, wherein at least one primer from the third plurality anneals to the template near a terminus of one of the first or second contigs. The annealed third primer is then extended with a mixture of dNTPs and ddNTPs to generate a series of nucleic acid fragments. The sequence of the template between the first and second contigs is then determined from the series of nucleic acid fragments.
The process of the invention can be repeated as often as desired to sequence the entire length of the target nucleic acid molecule.
The invention is further drawn to a method for amplifying (as opposed to sequencing) a nucleic acid template. Here, the method comprises providing a plurality of first primers, each first primer comprising (i) a region of fixed nucleotide sequence and (ii) a region of randomized nucleotide sequence located 5xe2x80x2 to, 3xe2x80x2 to, flanking, or within the region of fixed nucleotide sequence; providing a plurality of second primers, each second primer comprising (i) a region of fixed nucleotide sequence and (ii) a region of randomized nucleotide sequence located 5xe2x80x2 to, 3xe2x80x2 to, flanking, or within the region of fixed nucleotide sequences, wherein the region of fixed nucleotide sequence of the second plurality of primers is shorter than the region of fixed nucleotide sequence of the first plurality of primers; and then amplifying the nucleic acid template with the first and second plurality of primers, wherein at least one primer from the first and second plurality anneals to the template.
Further aims, objects, and advantages of the invention will become apparent upon a complete reading of the Detailed Description that follows.