The invention relates to nucleic acid sequencing and mapping and, more particularly, to the sequencing and mapping of double-stranded nucleic acid templates.
An aggressive research effort to sequence the entire human genome is proceeding in the laboratories of genetic researchers throughout the country. The project is called the Human Genome Project (HGP). It is a daunting task given that it involves the complete characterization of the archetypal human genome sequence which comprises 3xc3x97109 DNA nucleotide base pairs. Early estimates for completing the task within fifteen years hinged on the expectation that new technology would be developed in response to the pressing need for faster methods of DNA sequencing.
Current approaches generally incorporate the fundamentals of either the Sanger sequencing method or the Maxam and Gilbert sequencing method, two techniques that were first introduced in the 1970""s. [Sanger et al., (1977) xe2x80x9cDNA Sequencing with Chain-Terminator Inhibitors,xe2x80x9d Proc. Natl. Acad. Sci. USA 74:5463-5467); Maxam and Gilbert, (1977) xe2x80x9cA new method for sequencing DNA,xe2x80x9d Proc. Natl. Acad. Sci. USA, 74:560-564]. In the Sanger Method, a short oligonucleotide or primer is annealed to a single-stranded template containing the DNA to be sequenced. The primer provides a 3xe2x80x2 hydroxyl group which allows the polymerization of a chain of DNA when a polymerase enzyme and dNTPs are provided. The Sanger method is an enzymatic reaction that utilizes chain-terminating dideoxynucleotides (ddNTPs). ddNTPs are chain-terminating because they lack a 3xe2x80x2-hydroxyl residue which prevents formation of a phosphodiester bond with a succeeding deoxyribonucleotide (dNTP). A small amount of one ddNTP is included with the four conventional dNTPs in a polymerization reaction. Polymerization or DNA synthesis is catalyzed by a DNA polymerase. There is competition between extension of the chain by incorporation of the conventional dNTPs and termination of the chain by incorporation of a ddNTP.
The original version of the Sanger method utilized the E. coli DNA polymerase I (xe2x80x9cpol Ixe2x80x9d), which has a polymerization activity, a 3xe2x80x2-5xe2x80x2 exonuclease proofreading activity, and a 5xe2x80x2-3xe2x80x2 exonuclease activity. Later, an improvement to the method was made by using Klenow fragment instead of pol I; Klenow lacks the 5xe2x80x2-3xe2x80x2 exonuclease activity that is detrimental to the sequencing reaction because it leads to partial degradation of template and product DNA. The Klenow fragment has several limitations when used for enzymatic sequencing. One limitation is the low processivity of the enzyme, which generates a high background of fragments that terminate by the random dissociation of the enzyme from the template rather than by the desired termination due to incorporation of a ddNTP. The low processivity also means that the enzyme cannot be used to sequence nucleotides that appear more than xcx9c250 nucleotides from the 5xe2x80x2 end of the primer. A second limitation is that Klenow cannot efficiently utilize templates which have homopolymer tracts or regions of high secondary structure. The problems caused by secondary structure in the template can be reduced by running the polymerization reaction at 55xc2x0 C. (R. Gomer and R Firtel, xe2x80x9cSequencing homopolymer regions.xe2x80x9d Bethesda Res. Lab. Focus 7:6 1985).
Improvements to the original Sanger method include the use of polymerases other than the Klenow fragment. Reverse transcriptase has been used to sequence templates that have homopolymeric tracts (S. Karanthanasis, xe2x80x9cM13 DNA sequencing using reverse transcriptasexe2x80x9d Bethesda Res. Lab. Focus 4(3):6 1982; Graham et al., xe2x80x9cDirect DNA sequencing using avian myeloblastosis virus and Moleney murine leukemia virus reverse transcriptasexe2x80x9d Bethesda Res. Lab. Focus 8(2):4 1986). Reverse trascriptase is somewhat better than the Klenow enzyme at utilizing templates containing homopolymer tracts.
The use of a modified T7 DNA polymerase (Sequenase(trademark)) was a significant improvement to the Sanger method. See Sambrook, J. et al. Molecular Cloning, A Laboratory Manual, 2d Ed. Cold Spring Harbor Laboratory Press, New York, 13.7-13.9 and Hunkapiller, M. W. (1991) Curr. Op. Gen. Devl. 1:88-92. T7 DNA polymerase does not have any inherent 5xe2x80x2-3xe2x80x2 exonuclease activity and has a reduced selectivity against incorporation of ddNTP. However, the 3xe2x80x2-5xe2x80x2 exonuclease activity leads to degradation of some of the oligonucleotide primers. Sequenase(trademark) is a chemically-modified T7 DNA polymerase that has reduced 3xe2x80x2 to 5xe2x80x2 exonuclease activity (Tabor et al. 1987, Proc. Natl. Acad. Sci. USA 84:4767). Sequenase(trademark) version 2.0 is a genetically engineered form of the T7 polymerase which completely lacks 3xe2x80x2 to 5xe2x80x2 exonuclease activity. Sequenase(trademark) has a very high processivity and high rate of polymerization. It can efficiently incorporate nucleotide analogs such as dITP and 7-deaza-dGTP which are used to resolve regions of compression in sequencing gels. In regions of DNA containing a high G+C content, Hoogsteen bond formation can occur which leads to compressions in the DNA. These compressions result in aberrant migration patterns of oligonucleotide strands on sequencing gels. Because these base analogs pair weakly with conventional nucleotides, intrastrand secondary structures during electrophoresis are alleviated. In contrast, Klenow does not incorporate these analogs as efficiently.
The use of Taq DNA polymerase and mutants thereof is a more recent addition to the improvements of the Sanger method [see U.S. Pat. No. 5,075,216 to Innis et al. (1993), hereby incorporated by reference]. Taq polymerase is a thermostable enzyme which works efficiently at 70-75xc2x0 C. The ability to catalyze DNA synthesis at elevated temperature makes Taq polymerase useful for sequencing templates which have extensive secondary structures at 37xc2x0 C. (the standard temperature used for Klenow and Sequenase(trademark) reactions). Taq polymerase, like Sequenase(trademark), has a high degree of processivity and like Sequenase 2.0, it lacks 3xe2x80x2 to 5xe2x80x2 nuclease activity. The thermal stability of Taq and related enzymes (such as Tth and Thermosequenase(trademark)) provides an advantage over T7 polymerase (and all mutants thereof) in that these thermally stable enzymes can be used for cycle sequencing which amplifies the DNA during the sequencing reaction, thus allowing sequencing to be performed on smaller amounts of DNA. Optimization of the use of Taq in the standard Sanger Method has focused on modifying Taq to eliminate the intrinsic 5xe2x80x2-3xe2x80x2 exonuclease activity and to increase its ability to incorporate ddNTPs to reduce incorrect termination due to secondary structure in the single-stranded template DNA. Tabor and Richardson, EP 0 655 506 B1, hereby incorporated by reference.
Both the Sanger and the Maxim/Gilbert methods produce populations of radiolabelled or fluorescently labeled polynucleotides of differing lengths which are separated according to size by polyacrylamide gel electrophoresis (PAGE). The nucleotide sequence is determined by analyzing the pattern of size-separated radiolabelled polynucleotides in the gel.
The current limitations to conventional applications of the Sanger Method include 1) the limited resolving power of polyacrylamide gel electrophoresis, 2) the formation of intermolecular and intramolecular secondary structure of the denatured template in the reaction mixture, which can cause any of the polymerases to prematurely terminate synthesis at specific sites or misincorporate ddNTPs at inappropriate sites, 3) secondary structure of the DNA on the sequencing gels can give rise to compressions of the electrophoretic ladder at specific locations in the sequence, 4) cleavage of the template, primers and products with the 5xe2x80x2-3xe2x80x2 or 3xe2x80x2-5xe2x80x2 exonuclease activities in the polymerases, and 5) mispriming of synthesis due to hybridization of the oligonucleotide primers to multiple sites on the denatured template DNA. The formation of intermolecular and intramolecular secondary structure produces artificial terminations that are incorrectly xe2x80x9creadxe2x80x9d as the wrong base, gives rise to bands across four lanes (BAFLs) that produce ambiguities in base reading, and decrease the intensity and thus signal-to-noise ratio of the bands. Secondary structure of the DNA on the gels can largely be solved by incorporation of dITP or 7-deaza-dGTP into the synthesized DNA; DNA containing such modified NTPs is less likely to form urea-resistant secondary structure during electrophoresis. Cleavage of the template, primers or products leads to reduction in intensity of bands terminating at the correct positions and increase the background. Mispriming gives rise to background in the gel lanes.
The net result is that, although the inherent resolution of polyacrylamide gel electrophoresis alone is as much as 1000 nucleotides, it is common to only be able to correctly read 400-600 nucleotides of a sequence (and sometimes much less) using the conventional Sanger Method, even when using optimized polymerase design and reaction conditions. Some sequences such as repetitive DNA, strings of identical bases (especially guanines, GC-rich sequences and many unique sequences) cannot be sequenced without a high degree of error and uncertainty.
In the absence of any methods to sequence DNA longer than 400-800 bases, investigators must subclone the DNA into small fragments and sequence these small fragments. The procedures for doing this in a logical way are very labor intensive, cannot be automated, and are therefore impractical. The most popular technique for large-scale sequencing, the xe2x80x9cshotgunxe2x80x9d method, involves cloning and sequencing of hundreds or thousands of overlapping DNA fragments. Many of these methods are automated, but require sequencing 5-10 times as many bases as minimally necessary, leave gaps in the sequence information that must be filled in manually, and have difficulty determining sequences with repetitive DNA.
Thus, the goal of placing rapid sequencing techniques in the hands of many researchers is yet to be achieved. New approaches are needed that eliminate the above-described limitations.
The invention relates to nucleic acid sequencing and mapping and, more particularly, to the sequencing and mapping of double-stranded nucleic acid templates. The invention employs a suitable polymerase to synthesize a new DNA strand using an undenatured, double-stranded DNA. This strand replacement (SR) reaction involves no net synthesis of DNA; elongation of the synthetic reaction requires the stepwise removal of one strand of the template at or within a few nucleotides of the site of synthesis so that the DNA remains almost completely double-stranded at every moment during the reaction.
The unique aspects of the method of the present invention include 1) use of polymerases optimized to possess strong 5xe2x80x2-3xe2x80x2 exonuclease activity, 2) use of a double-stranded, undenatured DNA template, 3) the ability to optimize the reaction conditions using lower temperature, higher salt, and other conditions designed to stabilize native Watson-Crick secondary structure in the template, 4) initiation of a sequencing reaction with a nick or gap of a double-stranded template, including the use of novel double-stranded adapters specifically designed to create unique strand replacement initiation sites when ligated to the end of restriction fragments, 5) elongation in a manner that the DNA remains double-stranded, and 6) termination of synthesis at either a ddNTP or other site-specific location.
Because the sequencing method of the present invention begins and continues with double-stranded DNA, the method avoids the formation of intermolecular and intramolecular secondary structure of the template in the reaction mixture. Moreover, the present invention contemplates embodiments where no primer is necessary; in this embodiment (Primer Independent Strand Replacement), there is no concern of cleavage of the primers or mispriming, and the initiation of the sequencing reaction is highly efficient and specific.
While the SR technique of the present invention is carried out without a denaturation step to generate single-stranded template, the method can (if desired) also be used with a primer and a double-stranded template with a short single-stranded region. This Primer Dependent Strand Replacement can be used with double-stranded templates having 1) naturally-occurring single-stranded regions (such as the 3xe2x80x2 overhangs of double-stranded telomeric DNA), 2) synthetically- or enzymatically-introduced single-stranded regions, or 3) regions created by ligation to special oligonucleotide adapters.
The product molecules are double-stranded, allowing for long stretches of the product DNA to be subsequently cleaved (using restriction enzymes) into smaller fragments for direct sequencing and other forms of analysis using conventional acrylamide or agarose gel electrophoresis. The sequencing of these restriction fragments allows for much longer DNA fragments to be sequenced without the need for subcloning. For sequencing purposes, the newly-synthesized strands are terminated at base-specific locations using either ddNTPs or other base-specific termination nucleotides and can be subjected to automated sequencing in commercially available sequenators.
Although the method is contemplated to find extensive application to determining the base sequence of DNA, the same principles can be applied to the mapping of sequences and sequence variations at lower resolution over long distances.
In one embodiment, the present invention contemplates sequencing of DNA to one side (e.g., clockwise) from a restriction site in a circular molecule of DNA. This method depends upon a reliable, specific method for introducing a nick in one specific strand. In another embodiment, both sides of a single internal restriction site (clockwise and counterclockwise) are sequenced in a covalently-closed circular or linear DNA molecule.
In one embodiment, the present invention contemplates a method for sequencing nucleic acid, comprising: a) providing: i) nucleic acid template capable of being double-stranded, ii) a polymerase having a polymerase activity and a 5xe2x80x2-3xe2x80x2 exonuclease activity, iv) a nucleic acid precursor, and iii) a terminating agent; b) mixing said polymerase, said precursors, said terminating agents and said template to create a reaction under conditions where said template is substantially double-stranded; and c) detecting product of said reaction under conditions whereby the nucleic acid sequence of at least a portion of said template is revealed. In one embodiment said template capable of being double-stranded comprises single-stranded nucleic acid that, upon cooling becomes substantially double-stranded.
In another embodiment, the present invention contemplates a method for sequencing nucleic acid, comprising: a) providing: i) substantially double-stranded nucleic acid template, ii) a polymerase having synthetic activity and a 5xe2x80x2-3xe2x80x2 exonuclease activity, iii) at least one nucleic acid precursors, and iv) at least one terminating agent; b) mixing said polymerase, said precursor, said terminating agent and said template under conditions such that nucleic acid synthesis takes place for a reaction period during which said template remains substantially double-stranded and nucleic acid product is created containing said terminating agent; and c) detecting said product of said reaction under conditions whereby the nucleic acid sequence of at least a portion of said template is revealed.
In yet another embodiment, the present invention contemplates a method for sequencing nucleic acid, comprising: a) providing: i) substantially double-stranded nucleic acid template, ii) a polymerase having synthetic activity and a 5xe2x80x2-3xe2x80x2 exonuclease activity, iii) one or more nucleic acid precursors, and iv) one or more terminating agents; b) mixing said polymerase, said one or more precursors, said one or more terminating agents and said template under conditions such that nucleic acid synthesis takes place for a reaction period during which said template remains substantially double-stranded and nucleic acid product is created containing said one or more terminating agents; and c) detecting said product of said reaction under conditions whereby the nucleic acid sequence of at least a portion of said template is revealed.
In one embodiment, said substantially double-stranded template comprises a single-stranded region. In this embodiment, an oligonucleotide primer can be used. For example, a primer can be added to the reaction of step (b); the primer should be capable of hybridizing to said single-stranded region of said substantially double-stranded template.
In another embodiment, an oligonucleotide primer is not used. Instead, prior to step (b) one strand of said substantially double-stranded template is nicked.
It is not intended that the present invention be limited by the nature of the nucleic acid precursors. In one embodiment, said one or more nucleic acid precursors mixed in step (b) are selected from the group consisting of DATP, dGTP, dTTP and dCTP. Similarly, it is not intended that the present invention be limited by the nature of the terminating agents. In one embodiment, said one or more terminating agents mixed in step (b) are selected from the group consisting of ddATP, ddGTP, ddTTP and ddCTP.
A variety of polymerases are suitable for the strand replacement reaction of the present invention. In one embodiment, the polymerase is Taq DNA polymerase. In another, the polymerase is E. coli DNA polymerase I.
It is not intended that the present invention be limited by the method by which the products of the reaction are detected and evaluated. In one embodiment, the detecting comprises gel electrophoresis. That is to say, the products are subjected to gel electrophoresis.
In one embodiment, the present invention contemplates a method for sequencing nucleic acid, comprising: a) providing: i) substantially double-stranded nucleic acid template, ii) an endonuclease capable of specifically nicking one of the strands of said double-stranded nucleic acid template, iii) a polymerase having synthetic activity and a 5xe2x80x2-3xe2x80x2 exonuclease activity, iv) one or more nucleic acid precursors, and v) one or more terminating agents; b) mixing said substantially double-stranded template with said endonuclease under conditions such that a substantially double-stranded template is produced containing a nick on one strand; c) adding a solution to said nicked template, said solution comprising said polymerase, said one or more precursors, and said one or more terminating agents, whereby said adding is carried out under conditions such that nucleic acid synthesis takes place for a reaction period during which said template remains substantially double-stranded and nucleic acid product is created containing said one or more terminating agents; and d) detecting said product of said reaction under conditions whereby the nucleic acid sequence of at least a portion of said template is revealed.
By xe2x80x9cspecifically nickingxe2x80x9d it is meant that nicking takes place on only one strand and (preferably) at only one site. In one embodiment, the endonuclease capable of such specific nicking is f 1 gpII.
As noted above, said one or more nucleic acid precursors mixed in step (b) may be selected from the group consisting of dATP, dGTP, dTTP and dCTP. In some cases, said one or more nucleic acid precursors are labeled. It is not intended that the present invention be limited by the nature of the label. In one embodiment, the label is selected from the group consisting of radiolabels and fluorescent labels. In a particular case, the label is 32P. Where the label is a radiolabel, it is desirable that the detecting comprise gel electrophoresis and autoradiography.
As noted above, said one or more terminating agents mixed in step (b) may be selected from the group consisting of ddATP, ddGTP, ddTTP and ddCTP. Such agents can also be labeled.
In a preferred embodiment, the present invention contemplates a method for sequencing nucleic acid, comprising: a) providing: i) substantially double-stranded nucleic acid template, said substantially double-stranded template comprising a single-stranded region, ii) a polymerase having synthetic activity and a 5xe2x80x2-3xe2x80x2 exonuclease activity, iii) one or more nucleic acid precursors, iv) one or more terminating agents, and v) a primer capable of hybridizing to said single-stranded region of said substantially double-stranded template; b) mixing said polymerase, said one or more precursors, said one or more terminating agents, said primer and said template under conditions such that nucleic acid synthesis takes place for a reaction period during which said template remains substantially double-stranded and nucleic acid product is created containing said one or more terminating agents; and c) detecting said product of said reaction under conditions whereby the nucleic acid sequence of at least a portion of said template is revealed. In one embodiment, such template is telomeric DNA, including but not limited to human telomeric DNA having 3xe2x80x2 overhangs. In one embodiment, the primer used to hybridize to said telomeric DNA comprises the sequence CCCUAA, including but not limited to a primer having the sequence (CCCUAA)4 (SEQ ID NO:1)
The present invention also contemplates special adapters useful in conjunction with the strand replacement method of the present invention. Such adapters are ligated to create an initiation site for strand replacement.
As used herein, the term xe2x80x9camplifiable nucleic acidxe2x80x9d is used in reference to nucleic acids which may be amplified by any amplification method, including but not limited to PCR.
As used herein, the term xe2x80x9cprimerxe2x80x9d refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, (i.e., in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer is preferably single stranded for maximum efficiency in amplification, but may alternatively be double stranded. If double stranded, the primer is first treated to separate its strands before being used to prepare extension products. Preferably, the primer is an oligodeoxyribonucleotide. The primer must be sufficiently long to prime the synthesis of extension products in the presence of the inducing agent. The exact lengths of the primers will depend on many factors, including temperature, source of primer and the use of the method.
As used herein, the term xe2x80x9cprobexe2x80x9d refers to an oligonucleotide (i.e., a sequence of nucleotides), whether occurring naturally as in a purified restriction digest or produced synthetically, recombinantly or by PCR amplification, which is capable of hybridizing to another oligonucleotide of interest. A probe may be single-stranded or double-stranded. Probes are useful in the detection, identification and isolation of particular gene sequences. It is contemplated that any probe used in the present invention will be labelled with any xe2x80x9creporter molecule,xe2x80x9d so that is detectable in any detection system, including, but not limited to fluorescent, enzyme (e.g. ELISA, as well as enzyme-based histochemical assays), radioactive, and luminescent systems. It is not intended that the present invention be limited to any particular detection system or label.
As used herein, the term xe2x80x9ctemplate,xe2x80x9d refers to nucleic acid that is to acted upon, such as nucleic acid that is to be mixed with polymerase. In some cases xe2x80x9ctemplatexe2x80x9d is sought to be sorted out from other nucleic acid sequences. xe2x80x9cSubstantially single-stranded templatexe2x80x9d is nucleic acid that is either completely single-stranded (having no double-stranded areas) or single-stranded except for a proportionately small area of double-stranded nucleic acid (such as the area defined by a hybridized primer or the area defined by intramolecular bonding). xe2x80x9cSubstantially double-stranded templatexe2x80x9d is nucleic acid that is either completely double-stranded (having no single-stranded region) or double-stranded except for a proportionately small area of single-stranded nucleic acid (such as the area defined at the ends of telomeric DNA).
As used herein, the term xe2x80x9cpolymerase chain reactionxe2x80x9d (xe2x80x9cPCRxe2x80x9d) refers to the method of K. B. Mullis U.S. Pat. Nos. 4,683,195 and 4,683,202, hereby incorporated by reference, which describe a method for increasing the concentration of a segment of a template sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the template sequence consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired template sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded template sequence. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the template molecule. Following annealing, the primers are extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing and polymerase extension can be repeated many times (ie., denaturation, annealing and extension constitute one xe2x80x9ccyclexe2x80x9d; there can be numerous xe2x80x9ccyclesxe2x80x9d) to obtain a high concentration of an amplified segment of the desired template sequence. The length of the amplified segment of the desired template sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as the xe2x80x9cpolymerase chain reactionxe2x80x9d (hereinafter xe2x80x9cPCRxe2x80x9d). Because the desired amplified segments of the template sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be xe2x80x9cPCR amplifiedxe2x80x9d.
With PCR, it is possible to amplify a single copy of a specific template sequence in genomic DNA to a level detectable by several different methodologies (e.g., hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of 32P-labeled deoxynucleotide triphosphates, such as dCTP or dATP, into the amplified segment). In addition to genomic DNA, any oligonucleotide sequence can be amplified with the appropriate set of primer molecules. In particular, the amplified segments created by the PCR process itself are, themselves, efficient templates for subsequent PCR amplifications.
xe2x80x9cAmplificationxe2x80x9d is a special case of nucleic acid replication involving template specificity. It is to be contrasted with non-specific template replication (ie., replication that is template-dependent but not dependent on a specific template). Template specificity is here distinguished from fidelity of replication (ie., synthesis of the proper polynucleotide sequence) and nucleotide (ribo- or deoxyribo-) specificity. Template specificity is frequently described in terms of xe2x80x9ctargetxe2x80x9d specificity. Target sequences are xe2x80x9ctargetsxe2x80x9d in the sense that they are sought to be sorted out from other nucleic acid. Amplification techniques have been designed primarily for this sorting out.
As used herein, the terms xe2x80x9cPCR productxe2x80x9d, xe2x80x9cPCR fragmentxe2x80x9d and xe2x80x9camplification productxe2x80x9d refer to the resultant mixture of compounds after two or more cycles of the PCR steps of denaturation, annealing and extension are complete. These terms encompass the case where there has been amplification of one or more segments of one or more target sequences.
As used herein, the term xe2x80x9camplification reagentsxe2x80x9d refers to those reagents (deoxyribonucleotide triphosphates, buffer, etc.), needed for amplification except for primers, nucleic acid template and the amplification enzyme. Typically, amplification reagents along with other reaction components are placed and contained in a reaction vessel (test tube, microwell, etc.).
As used herein, the terms xe2x80x9crestriction endonucleasesxe2x80x9d and xe2x80x9crestriction enzymesxe2x80x9d refer to bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide sequence.
As used herein, the term xe2x80x9crecombinant DNA moleculexe2x80x9d as used herein refers to a DNA molecule which is comprised of segments of DNA joined together by means of molecular biological techniques.
DNA molecules are said to have xe2x80x9c5xe2x80x2 endsxe2x80x9d and xe2x80x9c3xe2x80x2 endsxe2x80x9d because mononucleotides are reacted to make oligonucleotides in a manner such that the 5xe2x80x2 phosphate of one mononucleotide pentose ring is attached to the 3xe2x80x2 oxygen of its neighbor in one direction via a phosphodiester linkage. Therefore, an end of an oligonucleotides is referred to as the xe2x80x9c5xe2x80x2 endxe2x80x9d if its 5xe2x80x2 phosphate is not linked to the 3xe2x80x2 oxygen of a mononucleotide pentose ring and as the xe2x80x9c3xe2x80x2 endxe2x80x9d if its 3xe2x80x2 oxygen is not linked to a 5xe2x80x2 phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5xe2x80x2 and 3xe2x80x2 ends. In either a linear or circular DNA molecule, discrete elements are referred to as being xe2x80x9cupstreamxe2x80x9d or 5xe2x80x2 of the xe2x80x9cdownstreamxe2x80x9d or 3xe2x80x2 elements. This terminology reflects the fact that transcription proceeds in a 5xe2x80x2 to 3xe2x80x2 fashion along the DNA strand. The promoter and enhancer elements which direct transcription of a linked gene are generally located 5xe2x80x2 or upstream of the coding region However, enhancer elements can exert their effect even when located 3xe2x80x2 of the promoter element and the coding region. Transcription termination and polyadenylation signals are located 3xe2x80x2 or downstream of the coding region.
As used herein, the term xe2x80x9cvectorxe2x80x9d is used in reference to nucleic acid molecules that transfer DNA segment(s) from one cell to another. Vectors are used to introduce foreign DNA into host cells where it can be replicated (i.e., reproduced) in large quantities. The term xe2x80x9cvehiclexe2x80x9d is sometimes used interchangeably with xe2x80x9cvector.xe2x80x9d Vectors, including xe2x80x9ccloning vectorsxe2x80x9d allow the insertion of DNA fragments without the loss of the vector""s capacity for self-replication. Cloning vectors may be derived from viruses, plasmids or genetic elements from eucaryotic and/or procaryotic organisms; vectors frequently comprise DNA segments from several sources. Examples of cloning vectors include plasmids, cosmids, lambda phage vectors, P1 vectors, yeast artificial chromosomes (YACs), and bacterial artificial chromosomes (BACs).
The term xe2x80x9coligonucleotidexe2x80x9d as used herein is defined as a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, usually more than three (3), and typically more than ten (10) and up to one hundred (100) or more (although preferably between twenty and thirty). The exact size will depend on many factors, which in turn depends on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription, or a combination thereof.
A primer is selected to be xe2x80x9csubstantially xe2x80x9d complementary to a strand of specific sequence of the template. A primer must be sufficiently complementary to hybridize with a template strand for primer elongation to occur. A primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5xe2x80x2 end of the primer, with the remainder of the primer sequence being substantially complementary to the strand. Non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the template to hybridize and thereby form a template primer complex for synthesis of the extension product of the primer.
xe2x80x9cHybridizationxe2x80x9d methods involve the annealing of a complementary sequence to the target nucleic acid (the sequence to be detected). The ability of two polymers of nucleic acid containing complementary sequences to find each other and anneal through base pairing interaction is a well-recognized phenomenon. The initial observations of the xe2x80x9chybridizationxe2x80x9d process by Marmur and Lane, Proc. Natl. Acad. Sci. USA 46:453 (1960) and Doty et al., Proc. Natl. Acad. Sci. USA 46:461 (1960) have been followed by the refinement of this process into an essential tool of modem biology. Nonetheless, a number of problems have prevented the wide scale use of hybridization as a tool in human diagnostics. Among the more formidable problems are: 1) the inefficiency of hybridization; 2) the low concentration of specific target sequences in a mixture of genomic DNA; and 3) the hybridization of only partially complementary probes and targets.
With regard to efficiency, it is experimentally observed that only a fraction of the possible number of probe-target complexes are formed in a hybridization reaction. This is particularly true with short oligonucleotide probes (less than 100 bases in length). There are three fundamental causes: a) hybridization cannot occur because of secondary and tertiary structure interactions; b) strands of DNA containing the target sequence have rehybridized (reannealed) to their complementary strand; and c) some target molecules are prevented from hybridization when they are used in hybridization formats that immobilize the target nucleic acids to a solid surface.
Even where the sequence of a probe is completely complementary to the sequence of the target, i.e., the target""s primary structure, the target sequence must be made accessible to the probe via rearrangements of higher-order structure. These higher-order structural rearrangements may concern either the secondary structure or tertiary structure of the molecule. Secondary structure is determined by intramolecular bonding. In the case of DNA or RNA targets this consists of hybridization within a single, continuous strand of bases (as opposed to hybridization between two different strands). Depending on the extent and position of intramolecular bonding, the probe can be displaced from the target sequence preventing hybridization.
The xe2x80x9ccomplementxe2x80x9d of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5xe2x80x2 end of one sequence is paired with the 3xe2x80x2 end of the other, is in xe2x80x9cantiparallel association.xe2x80x9d Certain bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention and include, for example, inosine and 7-deazaguanine. Complementarity need not be perfect; stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the oligonucleotide, base composition and sequence of the oligonucleotide, ionic strength and incidence of mismatched base pairs.
Stability of a nucleic acid duplex is measured by the melting temperature, or xe2x80x9cTm.xe2x80x9d The Tm of a particular nucleic acid duplex under specified conditions is the temperature at which on average half of the base pairs have disassociated. The equation for calculating the Tm of nucleic acids is well known in the art. As indicated by standard references, an estimate of the Tm value may be calculated by the equation:
Tm=81.5xc2x0 C.+16.6 log M+0.41(% GC)xe2x88x920.61(% form)xe2x88x92500/L
where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L=length of the hybrid in base pairs [see e.g., Guide to Molecular Cloning Techniques, Ed. S. L. Berger and A. R Kimmel, in Methods in Enzymology Vol. 152, 401 (1987)]. Other references include more sophisticated computations which take structural as well as sequence characteristics into account for the calculation of Tm.