1. Field of the Invention
The present invention relates generally to the field of nucleic acid analysis. More particularly, it concerns the sequencing and mapping of double-stranded nucleic acid templates.
2. Description of Related Art
An aggressive research effort to sequence the entire human genome is proceeding in the laboratories of genetic researchers throughout the country. The project is called the Human Genome Project (HGP). It is a daunting task given that it involves the complete characterization of the archetypal human genome sequence which comprises 3xc3x97109 DNA nucleotide base pairs. Early estimates for completing the task within fifteen years hinged on the expectation that new technology would be developed in response to the pressing need for faster methods of DNA sequencing and improved DNA mapping techniques.
Currently physical mapping is used to identify overlapping clones of DNA so that all of the DNA in a particular region can be sequenced or otherwise studied. There are two basic techniques of physical mapping. First, all candidate overlapping clones can be restricted with a series of restriction enzymes and the restriction fragments separated by gel electrophoresis. Overlapping clones will share some DNA sequences and thus some common restriction fragments. By comparing the restriction fragment lengths from a number of clones, the extent of overlap between any two clones can be determined. This process is very tedious and can only evaluate a limited number of candidate clones. Second, if a large number of sequence tagged sites are known in the region studied, the DNA from those sequence tagged sites can be labeled and hybridized to the candidate clones. Clones that hybridize to the same sequence tagged sites are identified as overlapping. If many sequence tagged sites are shared between two clones, it is assumed that the overlap is extensive. Sequence tagged sites give a lot of information from a limited number of hybridization reaction, however, most regions of most genomes do not have extensive sequence tagged site resources. Both methods suffer from lack of direct correspondence between the sequence and the restriction sites or sequence tagged site locations.
Current DNA sequencing approaches generally incorporate the fundamentals of either the Sanger sequencing method or the Maxam and Gilbert sequencing method, two techniques that were first introduced in the 1970xe2x80x2s (Sanger et al., 1977; Maxam and Gilbert, 1977). In the Sanger method, a short oligonucleotide or primer is annealed to a single-stranded template containing the DNA to be sequenced. The primer provides a 3xe2x80x2 hydroxyl group which allows the polymerization of a chain of DNA when a polymerase enzyme and dNTPs are provided. The Sanger method is an enzymatic reaction that utilizes chain-terminating dideoxynucleotides (ddNTPs). ddNTPs are chain-terminating because they lack a 3xe2x80x2-hydroxyl residue which prevents formation of a phosphodiester bond with a succeeding deoxyribonucleotide (dNTP). A small amount of one ddNTP is included with the four conventional dNTPs in a polymerization reaction. Polymerization or DNA synthesis is catalyzed by a DNA polymerase. There is competition between extension of the chain by incorporation of the conventional dNTPs and termination of the chain by incorporation of a ddNTP.
The original version of the Sanger method utilized the E. coli DNA polymerase I (xe2x80x9cpol Ixe2x80x9d), which has a polymerization activity, a 3xe2x80x2-5xe2x80x2 exonuclease proofreading activity, and a 5xe2x80x2-3xe2x80x2 exonuclease activity. Later, an improvement to the method was made by using Klenow fragment instead of pol I; Klenow lacks the 5xe2x80x2-3xe2x80x2 exonuclease activity that is detrimental to the sequencing reaction because it leads to partial degradation of template and product DNA. The Klenow fragment has several limitations when used for enzymatic sequencing. One limitation is the low processivity of the enzyme, which generates a high background of fragments that terminate by the random dissociation of the enzyme from the template rather than by the desired termination due to incorporation of a ddNTP. The low processivity also means that the enzyme cannot be used to sequence nucleotides that appear more than xcx9c250 nucleotides from the 5xe2x80x2 end of the primer. A second limitation is that Klenow cannot efficiently utilize templates which have homopolymer tracts or regions of high secondary structure. The problems caused by secondary structure in the template can be reduced by running the polymerization reaction at 55xc2x0 C. (Gomer and Firtel, 1985).
Improvements to the original Sanger method include the use of polymerases other than the Klenow fragment. Reverse transcriptase has been used to sequence templates that have homopolymeric tracts (Karanthanasis, 1982; Graham et al., 1986). Reverse transcriptase is somewhat better than the Klenow enzyme at utilizing templates containing homopolymer tracts.
The use of a modified T7 DNA polymerase (Sequenase(trademark)) was a significant improvement to the Sanger method (Sambrook et al., 1989; Hunkapiller, 1991). T7 DNA polymerase does not have any inherent 5xe2x80x2-3xe2x80x2 exonuclease activity and has a reduced selectivity against incorporation of ddNTP. However, the 3xe2x80x2-5xe2x80x2 exonuclease activity leads to degradation of some of the oligonucleotide primers. Sequenase(trademark) is a chemically-modified T7 DNA polymerase that has reduced 3xe2x80x2 to 5xe2x80x2 exonuclease activity (Tabor et al., 1987). Sequenase(trademark) version 2.0 is a genetically engineered form of the T7 polymerase which completely lacks 3xe2x80x2 to 5xe2x80x2 exonuclease activity. Sequenase(trademark) has a very high processivity and high rate of polymerization. It can efficiently incorporate nucleotide analogs such as dITP and 7-deaza-dGTP which are used to resolve regions of compression in sequencing gels. In regions of DNA containing a high G+C content, Hoogsteen bond formation can occur which leads to compressions in the DNA. These compressions result in aberrant migration patterns of oligonucleotide strands on sequencing gels. Because these base analogs pair weakly with conventional nucleotides, intrastrand secondary structures during electrophoresis are alleviated. In contrast, Klenow does not incorporate these analogs as efficiently.
The use of Taq DNA polymerase and mutants thereof is a more recent addition to the improvements of the Sanger method (U.S. Pat. No. 5,075,216). Taq polymerase is a thermostable enzyme which works efficiently at 70-75xc2x0 C. The ability to catalyze DNA synthesis at elevated temperature makes Taq polymerase useful for sequencing templates which have extensive secondary structures at 37xc2x0 C. (the standard temperature used for Klenow and Sequenase(trademark) reactions). Taq polymerase, like Sequenase(trademark), has a high degree of processivity and like Sequenase 2.0, it lacks 3xe2x80x2 to 5xe2x80x2 nuclease activity. The thermal stability of Taq and related enzymes (such as Tth and Thermosequenase(trademark)) provides an advantage over T7 polymerase (and all mutants thereof) in that these thermally stable enzymes can be used for cycle sequencing which amplifies the DNA during the sequencing reaction, thus allowing sequencing to be performed on smaller amounts of DNA. Optimization of the use of Taq in the standard Sanger method has focused on modifying Taq to eliminate the intrinsic 5xe2x80x2-3xe2x80x2 exonuclease activity and to increase its ability to incorporate ddNTPs (EP 0 655 506 B1).
Both the Sanger and the Maxim-Gilbert methods produce populations of radiolabelled or fluorescently labeled polynucleotides of differing lengths which are separated according to size by polyacrylamide gel electrophoresis (PAGE). The nucleotide sequence is determined by analyzing the pattern of size-separated radiolabelled polynucleotides in the gel. The Maxim-Gilbert method involves degrading DNA at a specific base using chemical reagents. The DNA strands terminating at a particular base are denatured and electrophoresed to determine the positions of the particular base. By combining the information from fragments terminating at different bases or combinations of bases the entire DNA sequence can be reconstructed. However, the Maxim-Gilbert method involves dangerous chemicals, and is time- and labor-intensive. Thus, it is no longer used for most applications.
The current limitations to conventional applications of the Sanger method include 1) the limited resolving power of polyacrylamide gel electrophoresis, 2) the formation of intermolecular and intramolecular secondary structure of the denatured template in the reaction mixture, which can cause any of the polymerases to prematurely terminate synthesis at specific sites or misincorporate ddNTPs at inappropriate sites, 3) secondary structure of the DNA on the sequencing gels can give rise to compressions of the electrophoretic ladder at specific locations in the sequence, 4) cleavage of the template, primers and products with the 5xe2x80x2-3xe2x80x2 or 3xe2x80x2-5xe2x80x2 exonuclease activities in the polymerases, and 5) mispriming of synthesis due to hybridization of the oligonucleotide primers to multiple sites on the denatured template DNA. The formation of intermolecular and intramolecular secondary structure produces artificial terminations that are incorrectly xe2x80x9creadxe2x80x9d as the wrong base, gives rise to bands across four lanes (BAFLs) that produce ambiguities in base reading, and decrease the intensity and thus signal-to-noise ratio of the bands. Secondary structure of the DNA on the gels can largely be solved by incorporation of DITP or 7-deaza-dGTP into the synthesized DNA; DNA containing such modified NTPs is less likely to form urea-resistant secondary structure during electrophoresis. Cleavage of the template, primers or products leads to reduction in intensity of bands terminating at the correct positions and increase the background. Mispriming gives rise to background in the gel lanes.
The net result is that, although the inherent resolution of polyacrylamide gel electrophoresis alone is as much as 1000 nucleotides, it is common to only be able to correctly read 400-600 nucleotides of a sequence (and sometimes much less) using the conventional Sanger Method, even when using optimized polymerase design and reaction conditions. Some sequences such as repetitive DNA, strings of identical bases (especially guanines, GC-rich sequences and many unique sequences) cannot be sequenced without a high degree of error or uncertainty.
In the absence of any methods to consistently sequence DNA longer than about 1000 bases, investigators must subclone the DNA into small fragments and sequence these small fragments. The procedures for doing this in a logical way are very labor intensive, cannot be automated, and are therefore impractical. The most popular technique for large-scale sequencing, the xe2x80x9cshotgunxe2x80x9d method, involves cloning and sequencing of hundreds or thousands of overlapping DNA fragments. Many of these methods are automated, but require sequencing 5-10 times as many bases as minimally necessary, leave gaps in the sequence information that must be filled in manually, and have difficulty determining sequences with repetitive DNA.
Thus, the goal of placing rapid sequencing techniques and improved mapping techniques in the hands of many researchers is yet to be achieved. New approaches are needed that eliminate the above-described limitations.
The present invention overcomes these and other drawbacks inherent in the prior art by providing methods and compositions for the analysis of nucleic acids, in particular for sequencing and mapping nucleic acids using double-stranded strand replacement reactions. These methods result in accurate sequencing reactions, in certain aspects due to very short extension reactions, and thus produce more useful sequence data from large templates, which overcome the problems inherent in single-stranded sequencing techniques. The present invention also provides new and powerful techniques for analyzing telomere length, telomere and subtelomeric sequence information, and quantitating the length and number of single-stranded overhangs present in telomeres.
First provided are methods of creating or selecting one or more nucleic acid products that terminate with at least a first selected base. These terminated nucleic acid products and populations thereof may be used in a wide variety of embodiments, including, but not limited to, nucleic acid sequencing, nucleic acid mapping, and telomere analysis.
The methods of creating one or more nucleic acid products that terminate with at least a first selected base generally comprise contacting at least a first substantially double stranded nucleic acid template comprising at least a first break on at least one strand with at least a first effective polymerase and a terminating composition comprising at least a first terminating nucleotide, the base of which corresponds to the selected base, under conditions effective to produce a nucleic acid product terminated at the selected base.
The methods may first involve the synthesis, construction, creation or generation of the substantially double stranded nucleic acid template that comprises at least a first break on at least one strand. In which case, xe2x80x9ccontactingxe2x80x9d the template with the effective polymerase and terminating composition forms the second part of the method.
The term xe2x80x9ctemplate,xe2x80x9d as used herein, refers to a nucleic acid that is to be acted upon, generally nucleic acid that is to be contacted or admixed with at least a first effective polymerase and at least a first nucleotide substrate composition under conditions effective to allow the incorporation of at least one more nucleotide or base into the nucleic acid to form a nucleic acid product. In many embodiments of the present invention, the nucleic acid product generated is a nucleic acid product that terminates with at least a first selected base. In some cases xe2x80x9ctemplatexe2x80x9d means the target nucleic acids intended to be separated or sorted out from other nucleic acid sequences within a mixed population.
xe2x80x9cSubstantially or essentially double strandedxe2x80x9d nucleic acids or nucleic acid templates, as used herein, are generally nucleic acids that are double-stranded except for a proportionately small area or length of their overall sequence or length. The xe2x80x9cproportionately small areaxe2x80x9d is an area lacking double stranded sequence integrity. The xe2x80x9cproportionately small area lacking double stranded sequence integrityxe2x80x9d may be as small as a single broken bond in only one strand of the nucleic acid, i.e., a break or xe2x80x9cnickxe2x80x9d within the double stranded nucleic acid molecule.
The xe2x80x9cproportionately small area lacking double stranded sequence integrityxe2x80x9d may also be a gap produced within the double stranded nucleic acid molecule by excision or removal of at least one base or nucleotide. In these cases, the xe2x80x9csubstantially double stranded nucleic acidsxe2x80x9d may be described as being double-stranded except for a proportionately small area of single-stranded nucleic acid. xe2x80x9cProportionately small areas of single-stranded nucleic acidsxe2x80x9d are those corresponding to single-stranded areas, stretches or lengths of one, two, three, four, five, six, seven, eight, nine or about ten bases or nucleotides, as may be produced by creating a gap within the double stranded nucleic acid molecule by excision or removal of one, two, three, four, five, six, seven, eight, nine or about ten bases or nucleotides.
In certain aspects of the invention, larger xe2x80x9cproportionately small areas of single-stranded nucleic acidsxe2x80x9d are preferred, for example those corresponding to single-stranded areas, stretches or lengths of 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or about 100 bases or nucleotides, as may be produced by creating a gap within the double stranded nucleic acid molecule by excision or removal of 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or about 100 bases or nucleotides. In particular embodiments, even larger gaps may be created.
The xe2x80x9cproportionately small area of single-stranded nucleic acidxe2x80x9d within a substantially double stranded nucleic acid may occur at any point within the substantially double stranded nucleic acid molecule or template, Le., it may be terminal or integral. xe2x80x9cTerminal portions of single-stranded nucleic acidxe2x80x9d within a substantially double stranded nucleic acid are generally xe2x80x9coverhangsxe2x80x9d. Such xe2x80x9coverhangsxe2x80x9d may be naturally occurring overhangs, such as the area defined at the ends of telomeric DNA. xe2x80x9cOverhangsxe2x80x9d may also be engineered, Le., created by the hand of man, using one or more of the techniques described herein and known to those of skill in the art. xe2x80x9cIntegral portions of single-stranded nucleic acidxe2x80x9d within substantially double stranded nucleic acids, as used herein, will generally be engineered by the hand of man, again using one or more of the techniques described herein and known to those of skill in the art.
The term xe2x80x9cdouble strandedxe2x80x9d, as applied to nucleic acids and nucleic acid templates, is generally reserved for nucleic acids that are completely double-stranded and that have no break, gap or single-stranded region. This allows xe2x80x9csubstantially double strandedxe2x80x9d to be generally reserved for broken, nicked and/or gapped substantially double stranded nucleic acids and templates and substantially double stranded nucleic acids and templates that comprise at least a first single-stranded nucleic acid overhang.
The templates for use in the invention may be in virtually any form, including covalently closed circular templates and linear templates. Both xe2x80x9cnative or naturalxe2x80x9d and xe2x80x9crecombinantxe2x80x9d nucleic acids and nucleic acid templates may be employed. xe2x80x9cRecombinant nucleic acidsxe2x80x9d, as used herein, are generally nucleic acids that are comprised of segments of nucleic acids joined together by means of molecular biological techniques, i.e., by the hand of man. Although the nucleic acids for use in the methods will generally have been subjected to at least some isolation, and are thus not free from mans"" intervention, xe2x80x9cnative and naturalxe2x80x9d nucleic acids and nucleic acid templates are intended to mean nucleic acids that have undergone less molecular biological manipulation and more correspond to the genomic DNA or fractions or fragments thereof.
The templates may also be derived from any initial nucleic acid molecule, sample or source including, but not limited to, cloning vectors, viruses, plasmids cosmids, yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs) and chromosomal and extrachromosomal nucleic acids isolated from eukaryotic organisms, including, but not limited to, yeast, Drosophila and mammals, including, but not limited to, mice, rabbits, sheep, rats, goats, cattle, pigs, and primates such as humans, chimpanzees and apes.
In certain embodiments, the template may be created by cleavage from a precursor nucleic acid molecule. This generally involves treatment of the precursor molecule with enzymes that specifically cleave the nucleic acid at specific locations. Examples of such enzymes include, but are not limited to, restriction endonucleases, intron-encoded endonucleases, and DNA-based cleavage methods, such as triplex and hybrid formation methods, that rely on the specific hybridization of a nucleic acid segment to localize a cleavage agent to a specific location in the nucleic acid molecule.
In other embodiments, the template may be created by amplifying the template from a precursor nucleic acid molecule or sample. The amplified templates generally include a region to be analyzed, i.e. sequenced, and can be relatively small, or quite large in various embodiments.
In general, xe2x80x9camplificationxe2x80x9d may be considered as a particular example of nucleic acid replication involving template specificity. Amplification may be contrasted with non-specific template replication, i.e., replication that is template-dependent but not dependent on a specific template. xe2x80x9cTemplate specificityxe2x80x9d is here distinguished from fidelity of replication, ie., synthesis of the proper polynucleotide sequence, and nucleotide (ribo- or deoxyribo-) specificity. xe2x80x9cTemplate specificityxe2x80x9d is frequently described in terms of xe2x80x9ctargetxe2x80x9d specificity. Target sequences are xe2x80x9ctargetsxe2x80x9d in the sense that they are desired to be separated or sorted out from other nucleic acids. Amplification techniques have been designed primarily for this xe2x80x9csorting outxe2x80x9d.
Amplification reactions generally require an initial nucleic acid sample or template, appropriate primers, an amplification enzyme and amplification reagents, such as deoxyribonucleotide triphosphates, buffers, and the like. In the sense of this application, a template for amplification (or xe2x80x9can amplification templatexe2x80x9d) refers to an initial nucleic acid sample or template, and does not refer to the xe2x80x9csubstantially double stranded nucleic acid template comprising at least a first break on at least one strandxe2x80x9d. Therefore, as used herein, xe2x80x9can amplification templatexe2x80x9d is a xe2x80x9cpre-templatexe2x80x9d.
As used herein, the terms xe2x80x9camplifiable and amplified nucleic acidsxe2x80x9d are used in reference to any nucleic acid that may be amplified, or that has been amplified, by any amplification method including, but not limited to, PCR(trademark), LCR, and isothermal amplification methods. Thus, the xe2x80x9csubstantially double stranded nucleic acid templates that comprise at least a first break on at least one standxe2x80x9d may be amplified nucleic acids or amplified nucleic acid products as well as templates for the methods of the invention.
Widely used methods for amplifying nucleic acids are those that involve temperature cycling amplification, such as PCR(trademark). Isothermal amplification methods such as strand displacement amplification are also routinely employed to amplify nucleic acids. All such amplification methods are appropriate to amplify xe2x80x9ctemplatesxe2x80x9d for use in the invention from precursor nucleic acids or xe2x80x9cpre-templatesxe2x80x9d.
As used herein, the term xe2x80x9cPCR(trademark)xe2x80x9d (xe2x80x9cpolymerase chain reactionxe2x80x9d) generally refers to methods for increasing the concentration of a segment of a template sequence in a mixture of genomic DNA without cloning or purification, as described in U.S. Pat. No. 4,683,195 and U.S. Pat. No. 4,683,202, each incorporated herein by reference. The process generally comprises introducing at least two oligonucleotide primers to a DNA mixture containing the desired template sequence, followed by a sequence of xe2x80x9cthermal cyclingxe2x80x9d in the presence of a suitable DNA polymerase. The two primers are complementary to their respective strands of the double stranded template sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the template molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands.
In PCR, the steps of denaturation, primer annealing and polymerase extension are generally repeated many times, such that xe2x80x9cdenaturation, annealing and extensionxe2x80x9d constitute one xe2x80x9ccyclexe2x80x9d. Thus, xe2x80x9cthermal cyclingxe2x80x9d means the execution of numerous xe2x80x9ccyclesxe2x80x9d to obtain a high concentration of an amplified segment of the desired template sequence. As the desired amplified segments of the template sequence become the predominant sequences in the mixture, in terms of concentration, they are said to be xe2x80x9cPCR(trademark) amplifiedxe2x80x9d.
As used herein, the terms xe2x80x9cPCR(trademark) productxe2x80x9d, xe2x80x9cPCR(trademark) fragmentxe2x80x9d and xe2x80x9camplification productxe2x80x9d refer to the resultant mixture of compounds after two or more cycles of the PCR(trademark) steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences. xe2x80x9cPCR(trademark) products and fragmentsxe2x80x9d can naturally act as the broken, nicked or gapped substantially double stranded nucleic acid templates for use in the invention.
Once a suitable or desired nucleic acid precursor, pre-template or sample composition has been obtained, a wide variety of substantially double stranded nucleic acid templates may be created for use in the claimed methods. In certain embodiments, even double stranded nucleic acid templates may be generated that comprise at least a first break substantially at the same position on both strands of the template. The most evident utility of this aspect of the invention is in producing nucleic acid fragments of a manageable size for further analysis, wherein such fragmentation is required.
In certain of the preferred sequencing and mapping embodiments, the substantially double stranded nucleic acid template will comprise at least a first break on only one of the two strands. This is advantageous in that the product or products are generated from the same strand, leading to more direct and rapid analysis. In certain of the sequencing and mapping aspects of the invention, having the strand replacement start at a defined point on one strand is advantageous, particularly where analysis of the size of the products of the reaction, particularly the differential size of a population of products, is necessary.
However, in a most general sense, creating a break on only one strand operably means that only one break is present in the region or target region of the individual nucleic acid molecule being analyzed or utilized. The target region is defined as a region of sufficient length to yield useful information and yet to allow the required volume of data to be generated in relation to the original nucleic acid subjected to the analysis. Thus, breaks at a distant region of the same nucleic acid molecule, outside of the target region, or breaks in the same general target region of a population of nucleic acid molecules, can exist and yet the target will still be considered to contain a xe2x80x9cfunctional breakxe2x80x9d on only one strand.
In any event, in most aspects of the invention, the presence of additional breaks or nicks is not a drawback, so long as a 3xe2x80x2 hydroxyl group can be generated in the presence of a template strand that can support the incorporation of at least one complementary base. The presence of multiple breaks on both strands is either useful, as one can initiate synthesis at a plurality of points as only the xe2x80x9cfirst-encounteredxe2x80x9d break forms the functional break for extension and/or termination, or non-functional, and thus irrelevant, in most aspects of the invention. For example, although synthesis products may be produced from breaks on both strands, utilizing the labeling techniques in conjunction with the isolation or immobilization techniques as disclosed herein products from only one strand and closest to the detectable label are detected in the final analysis step, thus eliminating the requirement for a break on only one strand in the most rigid sense.
In general, the complexity of the nicking or breaking reaction is directly correlated with the complexity of the labeling and/or isolation or immobilization procedures. In aspects wherein a nick or break is generated at a single position in a population of identical templates, only a single detectable label is required to analyze the products of the extending and/or terminating reaction. The presence of additional breaks or nicks is made most useful when employed with additional labels and/or the isolation of a subset of the nucleic acid products prior to analysis.
Although by no means limiting, in substantially double stranded nucleic acid templates that comprise at least a first integral break or gap on only one strand, it is convenient to identify the intact or xe2x80x9cunbrokenxe2x80x9d strand as the xe2x80x9ctemplate strandxe2x80x9d, and the strand that comprises at least a first integral break or gap as the xe2x80x9cnon-template strandxe2x80x9d. In those methods of the invention that encompass sequencing, the template strand will generally act as the guideline for the incorporation of one or more complementary bases or nucleotides into the xe2x80x9cnon-template strandxe2x80x9d, which is herein defined as the xe2x80x9cextension of the non-template strandxe2x80x9d.
The xe2x80x9cextensionxe2x80x9d of the non-template strand may be an extension by a single base or nucleotide only, in which case the xe2x80x9cextensionxe2x80x9d is inherently an xe2x80x9cextension and terminationxe2x80x9d. The single base or nucleotide incorporated into the non-template strand is thus a xe2x80x9cterminating base or nucleotidexe2x80x9d. This allows the broken, nicked or gapped strand to also be referred to as xe2x80x9cthe terminated strandxe2x80x9d.
Alternatively, the xe2x80x9cextensionxe2x80x9d of the non-template strand may be an extension by two, three or more, or a plurality of, bases or nucleotides, and/or an extension to create a population of extended non-template strands each including a different number of incorporated bases or nucleotides. In these cases, xe2x80x9cterminationxe2x80x9d is not co-extensive with xe2x80x9cextensionxe2x80x9d, and termination may even be delayed until after the incorporation of a significant number of xe2x80x9cextendingxe2x80x9d bases or nucleotides. Thus, the broken, nicked or gapped strand that formed the starting point for the two, three or multiple base extension may also be termed xe2x80x9cthe synthesized strandxe2x80x9d.
In contrast, in substantially double stranded nucleic acid templates that comprise a terminal single-stranded portion or xe2x80x9coverhangxe2x80x9d, it may be more convenient to identify the single-stranded overhang portion as the template strand. This is essentially because the art uses an existing xe2x80x9chybridizablexe2x80x9d nucleic acid portion as a xe2x80x9ctemplatexe2x80x9d, e.g., in the sense that a sufficiently complementary probe or primer can hybridize to the template.
As used herein, the term xe2x80x9cprobexe2x80x9d refers to an oligonucleotide, i.e., a contiguous sequence of nucleotides, whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly or by PCR(trademark) amplification, that is capable of hybridizing to a nucleic acid of interest or portion thereof. Although probes may be single-stranded or double-stranded, the hybridizing probe described above in reference to binding to a nucleic acid overhang will generally be single-stranded. Probes are often labeled with a detectable label or xe2x80x9creporter moleculexe2x80x9d that is detectable in a detection system, including, but not limited to fluorescent, enzyme (e.g., ELISA), radioactive, and luminescent systems.
The term xe2x80x9cprimerxe2x80x9d, as used herein refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, that is capable of acting as a point of initiation of nucleic acid synthesis when placed under conditions in which the synthesis of a primer extension product that is complementary to a nucleic acid strand of interest is induced, e.g., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH. A primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact length of an effective primer depends on factors such as temperature of extension, source of primer and the particular extension method. Primers are preferably single stranded for maximum efficiency in amplification (but may be double stranded if first treated to separate the strands before use in preparing extension products). Primers are often preferably oligodeoxyribonucleotides.
The invention further provides various methods for generating the substantially double stranded, broken nucleic acid templates. Certain of the template-generation methods are generic to the creation of various types of template sought. For example, methods are disclosed that are capable of creating substantially double stranded nucleic acid templates in which either only one or both of the template strands are broken. Equally, distinct methods are provided for creating substantially double stranded nucleic acid templates in which both template strands are broken versus those for creating substantially double stranded nucleic acid templates in which only one of the template strands is broken.
Enzymatic methods are provided that are universally applicable to creating substantially double stranded nucleic acid templates in which either only one or both of the template strands are broken. Such methods generally comprise creating the template by contacting a double-stranded or substantially double-stranded nucleic acid with a combined effective amount of at least a first and second breaking enzyme combination. A xe2x80x9ccombined effective amount of at least a first and second breaking enzyme combinationxe2x80x9d is a combined amount of at least a first and second enzyme effective to create a substantially double stranded nucleic acid template in which either only one or both of the template strands comprise at least a first break.
Examples of broadly effective xe2x80x9cenzymatic breaking combinationsxe2x80x9d are uracil DNA glycosylase in combination with an effectively matched endonuclease, such as endonuclease IV or endonuclease V. In light of the present disclosure, those of ordinary skill in the art will understand that the use of a uracil DNA glycosylase-endonuclease combination is predicated on the prior incorporation of at least a first uracil base or residue into the nucleic acid molecule that is to form the template.
Accordingly, in certain embodiments, the invention provides for the creation of a template by generating a double-standed or substantially double-stranded nucleic acid molecule comprising at least a first uracil base or residue and contacting the uracil-containing nucleic acid molecule with a combined effective amount of a first, uracil DNA glycosylase enzyme and a second, endonuclease IV enzyme or endonuclease V enzyme. The use of endonuclease V in the combination is generally preferred. A xe2x80x9ccombined effective amount of a first, uracil DNA glycosylase enzyme and a second, endonuclease IV or V enzymexe2x80x9d is a combined amount of the enzymes effective to create a substantially double stranded nucleic acid template comprising at least a first gap corresponding in position to the position of the at least a first uracil base or residue incorporated into the uracil-containing nucleic acid molecule.
The incorporation of at least a first uracil base or residue into a double-stranded or substantially double-stranded nucleic acid molecule is generally achieved by incorporation of a dUTP residue in the nucleic acid synthesis reaction. In certain aspects of the invention it is desired to incorporate a single uracil base or residue into a specific location near the 5xe2x80x2 end of the nucleic acid template. In a general sense, this may be accomplished by methods comprising contacting a precursor molecule with at least a first and a second primer that amplify the template when used in conjunction with a polymerase chain reaction, wherein at least one of the first or second primers comprises at least a first uracil base, and conducting a polymerase chain reaction to create an amplified template containing a single uracil residue corresponding to the location of the uracil base in the uracil-containing primer. In certain aspects, both primers contain uracil, to produce an amplified template that contains a uracil residue near the 5xe2x80x2 end of both strands. In other embodiments, dUTP will be used in the synthesis of the template strand, thus incorporating multiple uracil residues into the template.
Incorporation of at least a first uracil base or residue only into one of the strands of the nucleic acid molecule allows for the subsequent generation of a substantially double stranded nucleic acid template in which only one of the template strands is broken, whereas incorporation of at least a first uracil base or residue into each of the strands of the nucleic acid molecule allows for the subsequent generation of a substantially double stranded nucleic acid template in which both of the template strands are broken.
Certain chemical cleavage compositions are also appropriate for creating substantially double stranded nucleic acid templates in which either only one or both of the template strands are broken. Such methods generally comprise creating the template by contacting a double-stranded or substantially double-stranded nucleic acid with an effective amount of an appropriate chemically-based nucleic acid cleavage composition. An xe2x80x9ceffective amount of an appropriate chemically-based nucleic acid cleavage compositionxe2x80x9d is an amount of the composition effective to create a substantially double stranded nucleic acid template in which either only one or both of the template strands comprise at least a first break.
In yet further embodiments, substantially double stranded nucleic acid templates in which either only one or both of the template strands are broken may be created by contacting a substantially double-stranded nucleic acid with an effective amount of at least a first appropriate nuclease enzyme. An xe2x80x9ceffective amount of at least a first appropriate nuclease enzymexe2x80x9d is an amount of the nuclease enzyme effective to create a substantially double stranded nucleic acid template in which either only one or both of the template strands comprise at least a first break.
In different embodiments, the invention provides methods for making and using substantially double stranded nucleic acid templates in which the one or more breaks or gaps are either located at a specific point or points along the nucleic acid template, or in which the one or more breaks or gaps are located at a random location or locations along the nucleic acid template. These may be referred to as xe2x80x9cspecifically broken, nicked or gapped templatesxe2x80x9d and xe2x80x9crandomly broken, nicked or gapped templatesxe2x80x9d, respectively. The methods for generating the specifically and randomly manipulated templates are generally different in principle and execution, although both nucleases and non-nuclease-based chemical or biological components may be used in various of the methods.
In certain embodiments, a substantially double stranded nucleic acid template comprising at least a first break or gap at a specific point on at least one strand of the template is created by contacting a double stranded or substantially double-stranded nucleic acid with an effective amount of at least a first specific nuclease enzyme. Exemplary specific nuclease enzymes are fl endonuclease, fd endonuclease or a restriction endonuclease. A preferred specific nuclease enzyme is fl endonuclease. An xe2x80x9ceffective amount of at least a first specific nuclease enzymexe2x80x9d is an amount of the nuclease enzyme effective to create a substantially double stranded nucleic acid template that comprises at least a first break or gap at a specific point on at least one strand of the template.
In other embodiments, the specific-type template is created by contacting a double-stranded or substantially double-stranded nucleic acid with an effective amount of an appropriate specific chemical cleavage composition. An exemplary embodiment is wherein the specific chemical cleavage composition comprises a nucleic acid segment, such as a hybrid or triple helix forming composition, that is linked to a metal ion chelating agent. The chelating agent binds a metal ion, and in the presence of a peroxide and a reducing agent, produces a hydroxyl radical that can nick or break a nucleic acid. The specificity of the cleavage is provided from the nucleic acid segment, which only hybridizes to or forms a triple helix at a specific location in the nucleic acid molecule to be broken or nicked. In certain cases, the hydroxyl radicals produced can diffuse, and thus a small region is broken or nicked, producing a gap. An xe2x80x9ceffective amount of at least a first specific chemical cleavage or triple helix-forming compositionxe2x80x9d is an amount of the composition effective to create a substantially double stranded nucleic acid template that comprises at least a first break or gap at a specific point on at least one strand of the template.
For use in certain embodiments, particularly the random break incorporation and random break degradation sequencing embodiments, the creation of a substantially double stranded nucleic acid template comprising at least a first random break or gap on at least one strand will be preferred. Templates with one or more breaks or nicks located at one or more random points or locations along the nucleic acid template are termed xe2x80x9crandomly nicked templatesxe2x80x9d. Suitable processes for creating such randomly nicked templates, or populations thereof, are collectively termed xe2x80x9crandom nickingxe2x80x9d.
xe2x80x9cRandom nickingxe2x80x9d generally refers to a process or processes effective to generate a substantially double stranded nucleic acid template that comprises at least a first broken bond located at at least a first random position within the sugar-phosphate backbone of at least one of the two strands of the nucleic acid template. As used herein, a xe2x80x9crandomly nicked templatexe2x80x9d is intended to mean xe2x80x9cat least a randomly nicked templatexe2x80x9d. This signifies that at least one randomly-located broken bond is present, which broken bond may form the starting point or xe2x80x9csubstratexe2x80x9d for further manipulations, e.g., to convert the nick into a gap.
A process of random nicking that creates at least a first randomly positioned broken bond in a strand of the template may then be extended to create a gap at that random point or position by excising at least the first base or nucleotide proximal to the broken bond. This then becomes a process of xe2x80x9crandom gappingxe2x80x9d effective to prepare a xe2x80x9crandom gap templatexe2x80x9d, or a population thereof, comprising one or more gaps of at least a nucleotide in length positioned randomly within the nucleic acid template.
In certain embodiments, particularly certain mapping and sequencing aspects, the creation of a substantially double stranded nucleic acid template comprising at least a first random break or gap on only one strand will be preferred. This is generally for ease of analysis of the information generated from a strand replacement reaction, but also has advantages as detailed above.
Suitable methods that may be adapted to create a substantially double stranded nucleic acid template comprising at least a first random break or gap on at least one, or only one, strand are provided herein. The optimation of the random nicking methods to mono-stranded or dual-stranded nicking is generally based upon the correlation between the breaking or nicking agent, enzyme, chemical or composition and the time and conditions used to produce the break or nick. Agents that produce a given break or nick under one set of conditions, can produce a completely different break under different conditions. For example, a breaking or nicking agent that produces a single break or nick under one reaction condition, can in certain embodiments produce a plurality of breaks or nicks under a second, distinct reaction condition. Thus, the double stranded nucleic acid template comprising at least a first random break or gap on at least one, or only one, strand that is produced depends not only on the breaking or nicking agent used, but the conditions used to conduct the breaking or nicking reaction.
In one embodiment, the at least randomly nicked template is created by generating a double-stranded or substantially double-stranded nucleic acid comprising at least a first randomly positioned exonuclease-resistant nucleotide, and contacting the nucleic acid with an effective amount of an exonuclease. Exemplary exonuclease-resistant nucleotides include, but are not limited to deoxyribonucleotide phosphorothioates and deoxyribonucleotide boranophosphates. The preferred effectively matched exonuclease is exonuclease III. In these embodiments, an xe2x80x9ceffective amount of an exonucleasexe2x80x9d is an amount of the exonuclease effective to degrade the strand containing the exonuclease-resistant base to the position of the resistant base.
The incorporation of at least a first randomly positioned exonuclease-resistant nucleotide into a double-stranded or substantially double-stranded nucleic acid molecule is generally achieved by utilizing extendable deoxynucleotides comprising the exonuclease-resistant feature during the synthesis of the nucleic acid precursor or template. The amount of exonuclease-resistant incorporated into the nucleic acid template can be controlled by adjusting the ratio of the extendable deoxynucleotides with and without the exonuclease-resistant feature used in the synthesis reaction.
In alternate aspects of the present invention, the at least randomly nicked template is created by contacting a double-stranded or substantially double-stranded nucleic acid with an effective amount of at least a first randomly-nicking or -breaking nuclease enzyme. Exemplary randomly-breaking nuclease enzymes are deoxyribonuclease I and CviJI restriction endonuclease. An xe2x80x9ceffective amount of at least a first randomly-nicking or -breaking nuclease enzymexe2x80x9d is an amount of the nuclease enzyme effective to create a substantially double stranded nucleic acid template in which either only one or both of the template strands comprise at least a first randomly located broken bond within the template backbone.
In yet a further aspect of the invention, the at least randomly nicked template is created by contacting a double-stranded or substantially double-stranded nucleic acid with a combined effective amount of at least a first and second randomly-breaking nuclease enzyme combination. Exemplary randomly-breaking enzymes for use as the first or second nuclease enzymes are the frequent-cutting restriction endonucleases Tsp509I, MaeII, TaiI, AluI, CviJI, NlaIII, MspI, HpaII, BstUI, BfaI, DpnII, MboI, Sau3AI, DpnI, ChaI, HinPI, HhaI, HaeIII, Csp6I, RsaI, TaqI and MseI, which may be used in any combination.
A xe2x80x9ccombined effective amount of at least a first and second randomly-breaking nuclease enzyme combination or frequent-cutting restriction endonuclease combinationxe2x80x9d is a combined amount of the nuclease enzymes effective to create a substantially double stranded nucleic acid template in which either only one or both of the template strands comprise at least a first randomly located broken bond within the template backbone.
As used herein, the terms xe2x80x9cnucleasesxe2x80x9d, xe2x80x9crestriction endonucleasesxe2x80x9d and xe2x80x9crestriction enzymesxe2x80x9d refer to enzymes, generally bacterial enzymes, that cut nucleic acids. Mostly, the enzymes cut nucleic acids at or near specific nucleotide sequences, but certain enzymes, such as DNAase I, produce essentially random cuts or breaks.
Further embodiments of randomly-nicked template creation rely on contacting a double-stranded or substantially double-stranded nucleic acid with an effective amount of a randomly-nicking or -breaking chemical cleavage composition.
Throughout the variety of randomly-nicking or -breaking chemical cleavage compositions that may be employed, an xe2x80x9ceffective amountxe2x80x9d is an amount of the chemical cleavage composition effective to create a substantially double stranded nucleic acid template in which either only one or both of the template strands comprise at least a first randomly located broken bond within the template backbone.
In preferred embodiments, the random chemical cleavage compositions will comprise or react to produce a hydroxyl radical. Certain suitable randomly-breaking chemical cleavage compositions comprise a chelating agent, a metal ion, a reducing agent and a peroxide, as exemplified by compositions that comprise EDTA, an Fe2+ ion, sodium ascorbate and hydrogen peroxide. In other embodiments, the randomly-breaking chemical cleavage composition comprises a compound, generally a dye, that produces a hydroxyl radical upon contact with a defined or specified wavelength(s) of light.
Randomly-nicked templates may also be created by effectively irradiating with gamma irradiation, i.e., by contacting a double-stranded or substantially double-stranded nucleic acid with an effective amount of gamma irradiation.
Effective application of one or more mechanical breaking processes may also be employed to create the randomly broken or nicked templates. Exemplary mechanical breaking processes include subjecting double-stranded or substantially double-stranded nucleic acids to effective amounts of: hydrodynamic forces, sonication, nebulization and/or freezing and thawing.
In the methods of creating nucleic acid products that terminate with at least a first selected base, the at least nicked nucleic acid template is contacted with at least a first effective polymerase and at least a first effective terminating composition comprising at least a first terminating nucleotide, wherein the base of the terminating nucleotide corresponds to the selected base desired for nucleic acid incorporation and termination, xe2x80x9cunder conditions effective to produce a nucleic acid product terminated at the selected basexe2x80x9d.
xe2x80x9cUnder conditions effective to produce a nucleic acid product terminated at the selected basexe2x80x9d means that the conditions are effective to permit at least one round of nucleotide extension and termination, thus incorporating at least one additional base or nucleotide (the selected base or corresponding nucleotide) into the nucleic acid product. The xe2x80x9ceffective conditionsxe2x80x9d are thus xe2x80x9cproduct-generating conditionsxe2x80x9d, xe2x80x9cnucleotide extension and termination-permissive conditionsxe2x80x9d or xe2x80x9cat least nucleotide extending and terminating conditionsxe2x80x9d.
Fundamental aspects of the xe2x80x9ceffective, product-generating conditionsxe2x80x9d include conditions permissive or favorable to the necessary biological reactions, ie., appropriate conditions of temperature, pH, ionic strength, and the like. The term xe2x80x9cunder conditions effective to produce a nucleic acid product terminated at the selected basexe2x80x9d also means, in and of itself, xe2x80x9cunder conditions suitable and for a period of time effective to produce a nucleic acid product terminated at the selected basexe2x80x9d.
According to the intended use(s) of the selected base-terminated nucleic acid products, or populations thereof, the xe2x80x9ceffective, product-generating conditions and timesxe2x80x9d may also be termed xe2x80x9ceffective nucleic acid sequencing conditionsxe2x80x9d and/or xe2x80x9ceffective nucleic acid mapping conditionsxe2x80x9d.
The xe2x80x9ceffective, product-generating conditions and timesxe2x80x9d will vary depending on the type of nucleic acid product or products that one wishes to generate: e.g., products in which the at least nicked nucleic acid template strand is extended with only a single base or nucleotide; or with only two selected bases or nucleotides; or with only three selected bases or nucleotides; or in which the at least nicked nucleic acid template strand is extended with a plurality of bases or nucleotides; and/or in which the at least nicked nucleic acid template is used to prime the synthesis of a population of extended nucleic acid strands, each terminated at a different point.
Inherent in the term xe2x80x9ceffective, product-generating conditionsxe2x80x9d is the concept that the xe2x80x9cat least a first effective polymerasexe2x80x9d will be a polymerase that is effective to generate the type of nucleic acid product or products desired under the extending or polymerizing conditions applied. Equally, the xe2x80x9cat least a first effective terminating compositionxe2x80x9d will be a terminating composition effective to generate the type of terminated nucleic acid product or products desired under the termination conditions applied.
Also inherent in the term xe2x80x9ceffective, product-generating conditionsxe2x80x9d is the concept that the xe2x80x9ceffective polymerasexe2x80x9d is a polymerase that is effective to act on the precise type of nick, break or gap in the template under the extending or polymerizing conditions applied. This means that the polymerase has synthetic activity under the chosen conditions, i.e., the polymerase is capable of catalyzing the addition of the desired type and number of bases or nucleotides using the nick, break or gap in the template as the xe2x80x9cpriming substratexe2x80x9d. The type of nick, break or gap in the template thus forms an xe2x80x9ceffective matched pairxe2x80x9d with the selected polymerase.
DNA molecules have xe2x80x9c5xe2x80x2 and 3xe2x80x2 endsxe2x80x9d, meaning that mononucleotides have been reacted to make oligonucleotides or polynucleotides in a manner such that the 5xe2x80x2 phosphate of one mononucleotide pentose ring is attached to the 3xe2x80x2 oxygen (from the original hydroxyl) of its neighbor in one direction via a phosphodiester linkage. Therefore, an end of an oligonucleotide or polynucleotide is referred to as the xe2x80x9c5xe2x80x2 endxe2x80x9d if its 5xe2x80x2 phosphate is not linked to the 3xe2x80x2 oxygen of a mononucleotide pentose ring and as the xe2x80x9c3xe2x80x2 endxe2x80x9d if its 3xe2x80x2 oxygen is not linked to a 5xe2x80x2 phosphate of a subsequent mononucleotide pentose ring.
As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide or polynucleotide, may also be said to have 5xe2x80x2 and 3xe2x80x2 ends. In either a linear or circular DNA molecule, discrete elements are referred to as being xe2x80x9cupstreamxe2x80x9d or 5xe2x80x2 of the xe2x80x9cdownstreamxe2x80x9d or 3xe2x80x2 elements. This terminology reflects the fact that transcription proceeds in a 5xe2x80x2 to 3xe2x80x2 fashion along the DNA strand.
In embodiments where the break in the substantially double stranded nucleic acid template is a nick that comprises, or is reacted to comprise, a 3xe2x80x2 hydroxyl group, the effective polymerase will generally either have 5xe2x80x2 to 3xe2x80x2 exonuclease activity or strand displacement activity, or both.
Effective polymerases in these categories include, for example, E. coli DNA polymerase I, Taq DNA polymerase, S. pneumoniae DNA polymerase I, Tfl DNA polymerase, D. radiodurans DNA polymerase I, Tth DNA polymerase, Tth XL DNA polymerase, M. tuberculosis DNA polymerase I, M. thermoautotrophicum DNA polymerase I, Herpes simplex-1 DNA polymerase, E. coli DNA polymerase I Klenow fragment, vent DNA polymerase, thermosequenase and wild-type or modified T7 DNA polymerases. In preferred embodiments, the effective polymerase will be E. coli DNA polymerase I, M. tuberculosis DNA polymerase I or Taq DNA polymerase.
Where the break in the substantially double stranded nucleic acid template is a gap of at least a base or nucleotide in length that comprises, or is reacted to comprise, a 3xe2x80x2 hydroxyl group, the range of effective polymerases that may be used is even broader. In such aspects, the effective polymerase may be, for example, E. coli DNA polymerase I, Taq DNA polymerase, S. pneumoniae DNA polymerase I, Tfl DNA polymerase, D. radiodurans DNA polymerase I, Tth DNA polymerase, Tth XL DNA polymerase, M. tuberculosis DNA polymerase I, M. thermoautotrophicum DNA polymerase I, Herpes simplex-1 DNA polymerase, E. coli DNA polymerase I Klenow fragment, T4 DNA polymerase, vent DNA polymerase, thermosequenase or a wild-type or modified T7 DNA polymerase. In preferred aspects, the effective polymerase will be E. coli DNA polymerase I, M. tuberculosis DNA polymerase I, Taq DNA polymerase or T4 DNA polymerase.
In those embodiments in which either the nicked or broken template does not initially comprise a 3xe2x80x2 hydroxyl group, such as when the template is generated by hydroxyl radicals (in certain instances) or certain physical or mechanical processes, the nicked template may still be manipulated or reacted to comprise the desired 3xe2x80x2 hydroxyl group. Methods for achieving this generally involve xe2x80x9cconditioningxe2x80x9d the non-3xe2x80x2 hydroxyl group containing position. In a preferred aspect of the invention, the xe2x80x9cconditioningxe2x80x9d involves exonuclease III treatment to remove the base or position lacking a 3xe2x80x2 hydroxyl group, leaving a 3xe2x80x2 hydroxyl group as a product of the removal reaction.
Various methods are also available for terminating the nucleic acid extension to produce the one or more terminated nucleic acid products. For example, the terminating composition may simply comprise a terminating dideoxynucleotide triphosphate, the base of which corresponds to the selected base. Extension with a single base and termination thus occur simultaneously as the dideoxynucleotide triphosphate in incorporated into the template at the break or nick, preventing further addition or extension due to the absence of an available xe2x80x94OH group.
In other embodiments, the terminating composition comprises a terminating deoxynucleotide triphosphate, the base of which corresponds to the selected base. Extension of the nicked strand with a single type of base and termination with that base still occur essentially simultaneously as only one type of deoxynucleotide triphosphate is available for incorporation into the template at the break or nick (with the number of bases incorporated into the nicked strand depending on the number of complementary bases in the corresponding or template strand), thus preventing further addition or extension due to the absence of other nucleotides.
Where detection of the nucleic acid product or products is desired, the product or products will preferably comprise a detectable label or isolation tag. Inherent in the term xe2x80x9cunder conditions effective to produce a nucleic acid product terminated at the selected basexe2x80x9d is the concept that the xe2x80x9ceffective terminating compositionxe2x80x9d is effective to incorporate a detectable label into the nucleic acid product or products under the terminating conditions applied, should such labeling be necessary or preferable for subsequent detection or execution of related sequencing or mapping techniques. The type of terminating composition and the type of label or tag in the nucleic acid product or products thus also form an xe2x80x9ceffective matched pairxe2x80x9d.
Accordingly, in any of the methods of the invention, the at least a first terminating nucleotide or nucleotides may comprise a detectable label or an isolation tag that is incorporated into the nucleic acid product or products. In certain aspects, the substantially double stranded nucleic acid template may comprise a detectable label or isolation tag incorporated into the template, and hence into the subsequent nucleic acid product or products, at a point other than the termination point. In other aspects, both the template and the terminating nucleotide or nucleotides may each comprise a detectable label or an isolation tag.
Preferred aspects of the invention require the detection of the terminated nucleic acid product or products generated by the foregoing methods. In certain embodiments, the nucleic acid product or products will be separated, e.g., by electrophoresis, mass spectroscopy, FPLC or HPLC, prior to detection.
The nucleic acid product or products will generally comprise a detectable label, and the nucleic acid product or products are detected by detecting the label. In certain aspects, the nucleic acid product or products will comprises an isolation tag, and the nucleic acid product or products are purified using the isolation tag, optionally prior to more precise detection or differentiation techniques. Suitable detectable labels and isolation tags are exemplified by radioactive, enzymatic and fluorescent labels; and biotin, avidin and streptavidin isolation tags.
Detection is generally integral to the use of the invention in methods for sequencing nucleic acids, wherein the methods comprise detecting the nucleic acid product or products under conditions effective to determine the nucleic acid sequence of at least a portion of the nucleic acid.
In certain embodiments, the introduction or incorporation of the at least a first selected base at the break or nick in the template allows for direct nucleic acid sequencing. These methods generally rely on the generation of a population of nucleic acid products randomly terminated at four selected bases, as exemplified by:
a) creating a population of substantially double-stranded nucleic acid templates from a nucleic acid molecule to be sequenced, each of the templates comprising at least a first random break, preferably only on one strand;
b) contacting the population of templates with an effective polymerase and a terminating composition comprising four distinct labeled or tagged terminating nucleotides, under conditions effective to produce a population of terminated nucleic acid products randomly terminated at four selected bases;
c) detecting the population of randomly terminated nucleic acid products under conditions effective to determine the nucleic acid sequence of at least a portion of the original nucleic acid molecule.
In certain embodiments, the population of templates is contacted with the terminating composition in four distinct reactions, or wells, each of the reactions comprising only one of the four distinct labeled or tagged terminating nucleotides.
In other embodiments, the population of templates is contacted with the terminating composition in a single reaction, or well, wherein each of the four terminating nucleotides comprises a distinct, fluorescent label.
In further sequencing embodiments, the introduction or incorporation of the at least a first selected base at the break or nick in the template acts as a primer for other, non-direct nucleic acid sequencing methods. An exemplary method is xe2x80x9cSangerxe2x80x9d-based sequencing, originating at the nick or gap in the double-stranded template. Such a method may comprise:
a) creating at least a first substantially double-stranded nucleic acid template from the nucleic acid molecule to be sequenced, the template comprising at least a first random break, preferably only on one strand;
b) contacting the at least a first template with an effective polymerase and at least a first extending and terminating composition comprising four extending deoxynucleotide triphosphates and a labeled or tagged terminating dideoxynucleotide triphosphate, under conditions effective to produce a population of terminated nucleic acid products, each originating from the random break;
c) detecting the terminated nucleic acid products under conditions effective to determine the nucleic acid sequence of at least a portion of the original nucleic acid molecule.
Again, the four terminating bases may comprise distinct fluorescent labels.
In addition to xe2x80x9cSanger-likexe2x80x9d methods, still further analytical and sequencing methods also require the introduction or incorporation of at least one further base at the break or gap in the template in addition to the selected base. Thus, a first and a second selected base may be incorporated; or this may be described as incorporating a xe2x80x9cspecified basexe2x80x9d in addition to the selected base. Production of a nucleic acid product comprising at least one specified base prior to termination at the selected base requires contacting the template with an effective polymerase and extending and terminating composition, wherein the extending composition comprises the extending specified base.
These methods may be further defined as methods for identifying a selected dinucleotide sequence in the template strand of the nucleic acid template, the dinucleotide sequence being the complement of the specified and selected base incorporated into the non-template, or synthesized strand that originally contained the nick or gap. Such methods comprise:
a) blocking the at least nicked template by contacting the at least nicked template with a first blocking composition comprising the three dideoxynucleotide triphosphates that do not contain the specified base, to create a blocked template;
b) removing the first blocking composition from contact with the blocked template;
c) contacting the blocked template with at least a first extending and terminating composition comprising an extending deoxynucleotide triphosphate containing the specified base, and a tagged or labeled terminating dideoxynucleotide triphosphate containing the selected base, under conditions effective to produce a nucleic acid product terminating with a dinucleotide sequence of the specified and selected base; and
d) detecting the nucleic acid product under conditions effective to identify the selected dinucleotide sequence in the template strand of the nucleic acid template.
Defining the selected dinucleotide sequence as a first and second base in a template strand of a nucleic acid template, such methods are defined as comprising:
a) blocking the at least nicked template by contacting with a first blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the first base, to create a blocked template;
b) removing the first blocking composition from contact with the blocked template;
c) contacting the blocked template with at least a first extending and terminating composition comprising an extending deoxynucleotide triphosphate containing the complement of the first base, and a tagged or labeled terminating dideoxynucleotide triphosphate containing the complement of the second base, under conditions effective to produce a nucleic acid product terminating with a dinucleotide sequence complementary to the first and second base; and
d) detecting the nucleic acid product under conditions effective to identify the selected dinucleotide sequence in the nucleic acid template.
In such methods, step (c) may be conducted as a single extending and terminating step, comprising contacting with a composition that comprises both the extending deoxynucleotide triphosphate and the terminating dideoxynucleotide triphosphate.
Step (c) may also be conducted as at least two distinct extending and terminating steps, comprising first contacting the template with an extending composition that comprises the extending deoxynucleotide triphosphate, and then contacting the template with a distinct terminating composition that comprises the terminating dideoxynucleotide triphosphate. Step (c) may comprise, in sequence, contacting the template with an extending composition that comprises the extending deoxynucleotide triphosphate, removing the extending composition from contact with the template, and contacting the template with a distinct terminating composition that comprises the terminating dideoxynucleotide triphosphate.
The non-Sanger analytical and sequencing methods may also require the introduction or incorporation of at least two further bases at the break or gap in the template in addition to the selected base. Thus, the nicked template is subjecting to a series of blocking and washing, and extending and washing reactions prior to contact with the terminating composition, thereby producing an extended nucleic acid product comprising two, three or a series of additional bases preceding the selected, terminating base.
Such methods allow for the identification of a selected trinucleotide sequence in a nucleic acid template, the trinucleotide sequence being the complement of the first and second specified bases and the selected base, the method comprising:
a) blocking the at least nicked template by contacting with a first blocking composition comprising three dideoxynucleotide triphosphates that do not contain the first specified base, to create a first-blocked template;
b) removing the first blocking composition from contact with the first-blocked template;
c) extending the first-blocked template by contacting with a first extending composition comprising an extending deoxynucleotide triphosphate containing the first specified base, to create a first-extended template;
d) removing the first extending composition from contact with the first-extended template;
e) blocking the first-extended template by contacting with a second blocking composition comprising three dideoxynucleotide triphosphates that do not contain the second specified base to create a second-blocked template;
f) removing the second blocking composition from contact with the second-blocked template;
g) contacting the second-blocked template with at least a first extending and terminating composition comprising an extending deoxynucleotide triphosphate containing the second specified base, and a tagged or labeled terminating dideoxynucleotide triphosphate containing the selected base, under conditions effective to produce a nucleic acid product terminating with a trinucleotide sequence of the first and second specified bases and the selected base; and
h) detecting the nucleic acid product under conditions effective to identify a selected trinucleotide sequence in the nucleic acid sample.
Defining the selected trinucleotide sequence as a first, second and third base in a template strand of a nucleic acid template, the foregoing methods are defined as comprising:
a) blocking the at least nicked template by contacting with a first blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the first base to create a first-blocked template;
b) removing the first blocking composition from contact with the first-blocked template;
c) extending the first-blocked template by contacting with a first extending composition comprising an extending deoxynucleotide triphosphate containing the complement of the first base to create a first-extended template;
d) removing the first extending composition from contact with the first-extended template;
e) blocking the first-extended template by contacting with a second blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the second base to create a second-blocked template;
f) removing the second blocking composition from contact with the second-blocked template;
g) contacting the second-blocked template with at least a first extending and terminating composition comprising an extending deoxynucleotide triphosphate containing the complement of the second base, and a tagged or labeled terminating dideoxynucleotide triphosphate containing the complement of the third base, under conditions effective to produce a nucleic acid product terminating with a trinucleotide sequence complementary to the first, second and third bases; and
h) detecting the nucleic acid product under conditions effective to identify the selected trinucleotide sequence in the nucleic acid sample.
These methods may comprise:
a) blocking the at least nicked template by contacting with a first blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the first base to create a first-blocked template;
b) removing the first blocking composition from contact with the first-blocked template;
c) extending the first-blocked template by contacting with a first extending composition comprising an extending deoxynucleotide triphosphate containing the complement of the first base to create a first-extended template;
d) removing the first extending composition from contact with the first-extended template;
e) blocking the first-extended template by contacting with a second blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the second base to create a second-blocked template;
f) removing the second blocking composition from contact with the second-blocked template;
g) further extending the second-blocked template by contacting with a second extending composition comprising an extending deoxynucleotide triphosphate containing the complement of the second base to create a second-extended template;
h) terminating the reaction by contacting the second-extended template with a terminating composition comprising a tagged or labeled terminating dideoxynucleotide triphosphate containing the complement of the third base, under conditions effective to produce a nucleic acid product terminating with a trinucleotide sequence complementary to the first, second and third bases; and
i) detecting the nucleic acid product under conditions effective to identify a selected trinucleotide sequence in the nucleic acid sample.
The methods of di- and tri-nucleotide identification may further be used as methods for sequencing a nucleic acid molecule by identifying selected di- or tri-nucleotide sequences, wherein the identification of the selected di- or tri-nucleotide sequences is followed by the compilation of the identified di- or tri-nucleotide sequences to determine the contiguous nucleic acid sequence of at least a portion of the nucleic acid molecule.
The methods of selecting at least a first nucleic acid product terminated with at least a first selected base generally comprise creating a substantially double stranded nucleic acid template comprising at least a first break on at least one strand, and contacting the template with an effective polymerase and a terminating composition comprising at least a first terminating nucleotide, wherein the base of the terminating nucleotide corresponding to the selected base, under conditions effective to produce a nucleic acid product terminated at a selected base, or an effective polymerase and an extending composition under conditions effective to produce a fully extended product only from a template that terminates at the selected base. The methods may first involve creating a substantially double stranded nucleic acid template comprising at least a first random double stranded break.
The methods may be further defined as methods for determining the position of at least a first selected dinucleotide sequence of at least a first and at least a second base in at least a first nucleic acid template. The methods may comprise:
a) ligating a double-stranded nucleic acid segment to the double-stranded break, the double-stranded nucleic acid segment comprising an upper strand comprising a 5xe2x80x2 end comprising a phosphate group and a blocked 3xe2x80x2 end and a lower strand comprising a blocked 5xe2x80x2 end and a 3xe2x80x2 end comprising a hydroxyl group;
b) blocking the template by contacting with a first blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the first base;
c) removing the first blocking composition from contact with the template;
d) extending the template by contacting with a first extending composition comprising an extending deoxynucleotide triphosphate containing the complement of the first base;
e) removing the first extending composition from contact with the template;
f) blocking the template by contacting with a second blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the second base;
g) removing the second blocking composition from contact with the template;
h) contacting the template with at least a second extending composition comprising four extending deoxynucleotide triphosphates, at least one of the extending deoxynucleotide triphosphates containing a tagged or labeled base, under conditions effective to produce a fully extended tagged or labeled nucleic acid product with a dinucleotide sequence complementary to the first and second bases; and
i) detecting the nucleic acid product under conditions effective to determine the position of the selected dinucleotide sequence in the nucleic acid sample.
The methods of determining the position of at least a first selected dinucleotide sequence comprising at least a first base and a second base in one or more nucleic acid templates may alternatively comprise:
a) attaching a double-stranded nucleic acid segment to the double-stranded break, the double-stranded nucleic acid segment comprising an upper strand comprising a 5xe2x80x2 end comprising a phosphate group and a blocked 3xe2x80x2 end and a lower strand comprising a blocked 5xe2x80x2 end and a blocked 3xe2x80x2 end;
b) heating the template at a temperature effective to disassociate the lower strand of the adaptor;
c) annealing a single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the template, the first oligonucleotide comprising the same nucleotide sequence as the lower strand plus a first additional 3xe2x80x2 base complementary to the first base and a second additional 3xe2x80x2 base complementary to the second base;
d) contacting the template with an extending composition comprising four extending deoxynucleotide triphosphates, at least one of the extending deoxynucleotide triphosphates containing a tagged or labeled base, under conditions effective to produce a fully extended tagged or labeled nucleic acid product with a dinucleotide sequence complementary to the first and second bases; and
e) detecting the nucleic acid product under conditions effective to determine the position of the selected dinucleotide sequence in the nucleic acid sample.
Optionally, the methods of determining the position of at least a first selected dinucleotide sequence comprising at least a first base and a second base in at least a first nucleic acid template may comprise:
a) ligating a double-stranded nucleic acid segment to the double-stranded break, the double-stranded nucleic acid segment comprising an upper strand comprising a 5xe2x80x2 end comprising a phosphate group and a blocked 3xe2x80x2 end and a lower strand comprising a blocked 5xe2x80x2 end and a blocked 3xe2x80x2 end;
b) heating the ligated double-stranded nucleic acid segment at a temperature effective to disassociate the lower strand of the adaptor;
c) annealing a first single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the templates, the first oligonucleotide comprising the same nucleotide sequence as the lower strand;
d) blocking the templates by contacting with a first blocking composition comprising a dideoxynucleotide triphosphate that contains the complement of the first base;
e) removing the first blocking composition from contact with the templates;
f) contacting the templates with at least a first extending composition comprising four deoxynucleotide triphosphates, one of the deoxynucleotide triphosphates comprising a uracil base, under conditions effective to completely extend the non-template strand;
g) heating the templates at a temperature effective to disassociate the first single stranded oligonucleotide;
h) annealing a second single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the templates, the second oligonucleotide comprising the same nucleotide sequence as the first single-stranded oligonucleotide plus a first additional 3xe2x80x2 base complementary to the first base;
i) blocking the templates by contacting with a second blocking composition comprising a dideoxynucleotide triphosphate that contains the complement of the second base;
j) removing the second blocking composition from contact with the templates;
k) contacting the templates with the at least a first extending composition comprising four deoxynucleotide triphosphates, one of the deoxynucleotide triphosphates comprising a uracil base, under conditions effective to completely extend the non-template strand;
l) heating the templates at a temperature effective to disassociate the second single stranded oligonucleotide;
m) annealing a third single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the templates, the second oligonucleotide comprising the same nucleotide sequence as the second single-stranded oligonucleotide plus a second additional 3xe2x80x2 base complementary to the second base;
n) contacting the templates with at least a second extending and labeling composition comprising four deoxynucleotide triphosphates, at least one of which comprises a detectable label, under conditions effective to completely extend the non-template strand;
o) contacting the templates with at least a first degrading composition under conditions effective to degrade the non-template strands containing a uracil base; and
p) detecting the nucleic acid products under conditions effective to determine the position of the selected dinucleotide sequence in the nucleic acid templates.
The methods may also be further defined as methods for determining the position of at least a first selected trinucleotide sequence of at least a first, second and third base in one or more nucleic acid templates. The methods may comprise:
a) ligating a double-stranded nucleic acid segment to the double-stranded break, the double-stranded nucleic acid segment comprising an upper strand comprising a 5xe2x80x2 end comprising a phosphate group and a blocked 3xe2x80x2 end and a lower strand comprising a blocked 5xe2x80x2 end and a 3xe2x80x2 end comprising a hydroxyl group;
b) blocking the template by contacting with a first blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the first base;
c) removing the first blocking composition from contact with the template;
d) extending the template by contacting with a first extending composition comprising an extending deoxynucleotide triphosphate containing the complement of the first base;
e) removing the first extending composition from contact with the template;
f) blocking the template by contacting with a second blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the second base;
g) removing the second blocking composition from contact with the template;
h) extending the template by contacting with a second extending composition comprising an extending deoxynucleotide triphosphate containing the complement of the second base;
i) removing the second extending composition from contact with the template;
j) blocking the template by contacting with a third blocking composition comprising three dideoxynucleotide triphosphates that do not contain the complement of the third base;
k) removing the third blocking composition from contact with the template;
l) contacting the template with at least a third extending composition comprising four extending deoxynucleotide triphosphates, at least one of the extending deoxynucleotide triphosphates containing a tagged or labeled base, under conditions effective to produce a fully extended tagged or labeled nucleic acid product with a trinucleotide sequence complementary to the first, second and third bases; and
m) detecting the nucleic acid product under conditions effective to determine the position of the selected dinucleotide sequence in the nucleic acid sample.
The methods of determining the position of at least a first selected trinucleotide sequence comprising at least a first base, a second base and a third base in at least a first nucleic acid template may optionally comprise:
a) attaching a double-stranded nucleic acid segment to the double-stranded break, the double-stranded nucleic acid segment comprising an upper strand comprising a 5xe2x80x2 end comprising a phosphate group and a blocked 3xe2x80x2 end and a lower strand comprising a blocked 5xe2x80x2 end and a blocked 3xe2x80x2 end;
b) heating the template at a temperature effective to disassociate the lower strand of the adaptor;
c) annealing a single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the template, the first oligonucleotide comprising the same nucleotide sequence as the lower strand plus a first additional 3xe2x80x2 base complementary to the first base, a second additional 3xe2x80x2 base complementary to the second base and a third additional 3xe2x80x2 base complementary to the third base;
d) contacting the template with an extending composition comprising four extending deoxynucleotide triphosphates, at least one of the extending deoxynucleotide triphosphates containing a tagged or labeled base, under conditions effective to produce a fully extended tagged or labeled nucleic acid product with a trinucleotide sequence complementary to the first, second and third bases; and
e) detecting the nucleic acid product under conditions effective to determine the position of the selected trinucleotide sequence in the nucleic acid sample.
Alternatively, the methods of determining the position of at least a first selected trinucleotide sequence comprising at least a first base, a second base and a third base in one or more nucleic acid templates may comprise:
a) ligating a double-stranded nucleic acid segment to the double-stranded break, the double-stranded nucleic acid segment comprising an upper strand comprising a 5xe2x80x2 end comprising a phosphate group and a blocked 3xe2x80x2 end and a lower strand comprising a blocked 5xe2x80x2 end and a blocked 3xe2x80x2 end;
b) heating the ligated double-stranded nucleic acid segment at a temperature effective to disassociate the lower strand of the adaptor;
c) annealing a first single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the templates, the first oligonucleotide comprising the same nucleotide sequence as the lower strand;
d) blocking the templates by contacting with a first blocking composition comprising a dideoxynucleotide triphosphate that contains the complement of the first base;
e) removing the first blocking composition from contact with the templates;
f) contacting the templates with at least a first extending composition comprising four deoxynucleotide triphosphates, one of the deoxynucleotide triphosphates comprising a uracil base, under conditions effective to completely extend the non-template strand;
g) heating the templates at a temperature effective to disassociate the first single stranded oligonucleotide;
h) annealing a second single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the templates, the second oligonucleotide comprising the same nucleotide sequence as the first single-stranded oligonucleotide plus a first additional 3xe2x80x2 base complementary to the first base;
i) blocking the templates by contacting with a second blocking composition comprising a dideoxynucleotide triphosphate that contains the complement of the second base;
j) removing the second blocking composition from contact with the templates;
k) contacting the templates with the at least a first extending composition comprising four deoxynucleotide triphosphates, one of the deoxynucleotide triphosphates comprising a uracil base, under conditions effective to completely extend the non-template strand;
l) heating the templates at a temperature effective to disassociate the second single stranded oligonucleotide;
m) annealing a third single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the templates, the second oligonucleotide comprising the same nucleotide sequence as the second single-stranded oligonucleotide plus a second additional 3xe2x80x2 base complementary to the second base;
n) contacting the templates with the at least a second extending composition comprising four deoxynucleotide triphosphates, one of the deoxynucleotide triphosphates comprising a uracil base, under conditions effective to completely extend the non-template strand;
o) heating the templates at a temperature effective to disassociate the third single stranded oligonucleotide;
p) annealing a fourth single-stranded oligonucleotide comprising a 3xe2x80x2 hydroxyl group to the templates, the second oligonucleotide comprising the same nucleotide sequence as the third single-stranded oligonucleotide plus a third additional 3xe2x80x2 base complementary to the third base;
q) contacting the templates with at least a third extending and labeling composition comprising four deoxynucleotide triphosphates, at least one of which comprises a detectable label, under conditions effective to completely extend the non-template strand;
r) contacting the templates with at least a first degrading composition under conditions effective to degrade the non-template strands containing a uracil base; and
s) detecting the nucleic acid products under conditions effective to determine the position of the selected trinucleotide sequence in the nucleic acid templates.
Further methods of the present invention are methods of sequencing a nucleic acid molecule by identifying a selected dinucleotide sequence comprising a first base and a second base, the methods comprising:
a) creating a substantially double-stranded nucleic acid template comprising a selected dinucleotide sequence on a template strand and comprising an exonuclease-resistant nucleotide in the non-template strand, wherein the base of the exonuclease-resistant nucleotide is complementary to the first base;
b) contacting the template with an amount of an exonuclease effective to degrade the non-template strand until the position of the exonuclease-resistant nucleotide;
c) removing the exonuclease from contact with the template;
d) contacting the template with at least a first terminating composition comprising a tagged or labeled terminating dideoxynucleotide triphosphate containing the complement of the second base, under conditions effective to produce a nucleic acid product terminating with a dinucleotide sequence complementary to the first and second base; and
e) detecting the nucleic acid product under conditions effective to identify the selected dinucleotide sequence in the template strand of the nucleic acid template.
Detection of a selectively-terminated nucleic acid product or products is also generally integral to the use of the invention in methods for mapping a nucleic acid, wherein the methods generally comprise detecting the nucleic acid product or products under conditions effective to determine the position of the nucleic acid relative to the nucleic acid product or products. The mapping methods may comprise:
a) creating a population of substantially double-stranded nucleic acid templates from the nucleic acid, the templates comprising at least a first random break on at least one strand or at least a first random break on only one strand;
b) contacting the population of templates with an effective polymerase and at least a first degradable extension-producing composition comprising three non-degradable extending nucleotides (deoxynucleotides) and one degradable nucleotide, under conditions and for a time effective to produce a population of degradable nucleic acid products comprising the degradable nucleotide;
c) removing the degradable extension-producing composition from contact with the templates;
d) contacting the population of degradable nucleic acid products with an effective polymerase and at least a first nondegradable extending and terminating composition comprising four non-degradable extending deoxynucleotides, at least one of the non-degradable extending deoxynucleotides comprising a detectable label or an isolation tag, under conditions and for a time effective to produce a population of terminated nucleic acid products comprising a degradable region and a nondegradable region;
e) contacting the population of terminated nucleic acid products with an effective amount of a degrading composition to degrade the degradable region, thereby producing nested nucleic acid products; and
f) detecting the nested nucleic acid products under conditions effective to determine the position of the nucleic acid relative to the nucleic acid product.
As used herein, the term xe2x80x9cnested nucleic acid productsxe2x80x9d means a series of nucleic acid products that are a different distance from the point that the nucleic acid synthesis originates. In certain aspects, the products will be overlapping nucleic acid products, but this is not a requirements for most of the embodiments of the present invention.
In preferred embodiments, the degradable nucleotide will be a uracil base-containing nucleotide and the degrading composition will comprise a combined effective amount of a uracil DNA glycosylase enzyme and an endonuclease IV or an endonuclease V enzyme.
The present invention still further provides methods of sequencing through a telomeric repeat region into a subtelomeric region, comprising:
a) providing a substantially double-stranded nucleic acid that comprises, in contiguous sequence order, a terminal single-stranded telomeric overhang, a double-stranded telomeric repeat region and a double-stranded subtelomeric region;
b) contacting the nucleic acid with a composition comprising an oligonucleotide or primer that is substantially complementary to and hybridizes to the single-stranded telomeric overhang, an effective polymerase, four extending nucleotides and at least a first tagged or labeled terminating nucleotide under conditions effective to produce a nucleic acid product extended from the primer into the subtelomeric region; and
c) detecting the nucleic acid product under conditions effective to determine the nucleic acid sequence of the telomeric overhang, the telomeric repeat region and at least a portion of the subtelomeric region.
The present invention also provides a method for determining the percentage of telomeres in a population that contain 3xe2x80x2 overhangs, comprising:
a) contacting a telomere-containing nucleic acid sample suspected of having telomeres containing a first, 3xe2x80x2 overhang-containing strand and a second, non-overhang strand, with a composition comprising an oligonucleotide or primer that is substantially complementary to and hybridizes to the single-stranded telomeric overhang, an effective polymerase and four extending nucleotides under conditions effective to produce a nucleic acid product extended from the primer and a trimmed second, non-overhang strand, wherein a telomere that does not have a 3xe2x80x2 overhang will comprise a non-trimmed second, non-overhang strand; and
b) detecting the nucleic acid product under conditions effective to determine the amounts of the nucleic acid product, the trimmed second, non-overhang strand, the first, 3xe2x80x2 overhang-containing strand and the non-trimmed second, non-overhang strand.
In particular aspects, the amounts of the nucleic acid product, the trimmed second, non-overhang strand, the first, 3xe2x80x2 overhang-containing strand and the non-trimmed second, non-overhang strand are determined by hybridization with labeled G-rich and C-rich telomeric sequences or segments.
The term xe2x80x9coligonucleotidexe2x80x9d, as used herein, defines a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, usually more than three (3), and typically more than ten (10) and up to one hundred (100) or more. Preferably, xe2x80x9coligosxe2x80x9d comprise between about fifteen or twenty and about thirty deoxyribonucleotides or ribonucleotides. Oligonucleotides may be generated in any effective manner, including chemical synthesis, DNA replication, reverse transcription, or a combination thereof.
A primer is said to be xe2x80x9csubstantiallyxe2x80x9d complementary to a strand of specific sequence of a template where it is sufficiently complementary to hybridize to the template sufficient for primer elongation to occur. A primer sequence need not reflect the exact sequence of a template. For example, a non-complementary nucleotide fragment may be attached to the 5xe2x80x2 end of a primer, with the remainder of the primer sequence being substantially complementary to a template. Non-complementary bases or longer sequences can be interspersed into a primer, provided that the primer sequence has sufficient complementarity with the sequence of the template to hybridize and thereby form a template primer complex for synthesis of the extension product of the primer.
xe2x80x9cHybridizationxe2x80x9d methods involve the annealing of a complementary or sufficiently complementary sequence to a target nucleic acid sequence. The ability of two polymers of nucleic acid containing complementary sequences to anneal through base pairing interaction is a well-recognized phenomenon (Marmur and Lane, 1960; Doty et al., 1960).
The xe2x80x9ccomplementxe2x80x9d of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5xe2x80x2 end of one sequence is paired with the 3xe2x80x2 end of the other, is in xe2x80x9cantiparallel association.xe2x80x9d Certain bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention and include, for example, inosine and 7-deazaguanine. Complementarity need not be perfect; stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the oligonucleotide, base composition and sequence of the oligonucleotide, ionic strength and incidence of mismatched base pairs.
Stability of a nucleic acid duplex is measured by the melting temperature, or xe2x80x9cTm.xe2x80x9d The Tm of a particular nucleic acid duplex under specified conditions is the temperature at which on average half of the base pairs have disassociated. The equation for calculating the Tm of nucleic acids is well known in the art. As indicated by standard references, an estimate of the Tm value may be calculated by the equation:
Tm=81.5xc2x0 C.+16.6 log M+0.41(%GC)xe2x88x920.61(% form)xe2x88x92500/L 
where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, %form is the percentage of formamide in the hybridization solution, and L=length of the hybrid in base pairs (Berger and Kimmel, 1987). More sophisticated computations are also known in the art that take structural as well as sequence characteristics into account for the calculation of Tm.
The invention yet further provides methods of determining the length of a single-stranded overhang of a telomere, comprising contacting a telomere comprising a single-stranded overhang with an excess of a primer that hybridizes to the single-stranded overhang under conditions effective to allow hybridization of substantially complementary nucleic acids, and quantitating the primers thus hybridized to the single-stranded overhang. These methods may further comprise contacting the primers hybridized to the single-stranded overhang with a ligation composition in an amount and for a time effective to ligate the primers, wherein the length of the ligated primers is quantitated.