This invention relates to the field of DNA sequence design and construction comprising a method for determining DNA sequences with selected reaction attributes, such as binding affinities for their respective ligands, and for preparing such DNA sequences for various uses including as primers for diagnostic and analytical procedures to detect the presence of viral DNA.
Reactions between duplex DNA and ligands are largely dictated and mediated by the interplay of structural, thermodynamic and dynamic characteristics of DNA, and recognition mechanisms of reacting ligands. Ligands that bind to DNA span a broad range of sizes from small cations to large proteins and assembled protein aggregates. A wide variety of experimental strategies have been employed to examine sequence specificity exhibited by ligands that interact with DNA. Sequence dependent variations in local conformation and charge configuration along DNA are thought to be the principle means by which ligands discriminate between various DNA sequences. Such discrimination can be divulged and quantitatively evaluated from sequence specific thermodynamic binding parameters evaluated in studies of ligand/DNA complex formation.
Double helical DNA structure is maintained by a number of forces. Among these are the strong Coulombic interactions between phosphates along and across the backbone, hydrogen bonding between base pairs (bps) across the helix axis, stacking interactions between bps along one strand and across the helix axis and a multiplicity of interactions with charged solvent components. Inadequate understanding of these interactions precludes the construction of a realistic atomic model that correctly simulates the helixcoil or melting transition in DNA.
The most successful analytical approaches to modeling the helix-coil transition in DNA relate to the statistical thermodynamic formalism of the modified Ising model (R. M. Wartell and A. S. Benight, Physics Rpts., 126, 67-107 (1985)). In this approach the central assumption is that each bp of a DNA helix can occupy only one of two possible states. These are the "intact" and "broken" states. In the intact state a given bp is presumed to be hydrogen bonded and completely stacked with its neighboring bps on either side. Alternatively, in the broken or melted state a bp is not hydrogen bonded and is completely unstacked from its neighbors on either side.
In most models, melting stability arises from independent contributions of individual bps. More sophisticated models consider nearest-neighbor (n--n) interactions. Comparison of actual absorbance-versus-temperature measurements (melting curves) with calculations allow evaluation of the sequence-dependent energetics of DNA melting within the context of the two-state per bp model.
Over the past 30 years optical and calorimetric melting studies of duplex DNA have established that the melting temperature, t.sub.m, of DNA is a linearly increasing function of the percent of the bps that are of the guanine-cytosine type (%G.C). Greater stability of DNA with increased %G.C can be most readily attributed to the fact that G.C bps, with three hydrogen bonds are more stable than A.T bps with only two hydrogen bonds. Sequence dependent stacking interactions between neighboring bps may also contribute to this difference in a minor way. Thus to first order, DNA stability can be expressed as a number-weighted sum of the individual energies of two components, these being the "energies" of A.T and G.C bps. For a specific sequence, i, this energy (the H-bond energy) can be designated: EQU .DELTA.G.sub.H-bond (i)=.DELTA.S.sub.AT N.sub.AT T.sub.AT +.DELTA.S.sub.GC N.sub.GC T.sub.GC ( 1)
N.sub.AT and N.sub.GC are the numbers of A.T and G.C bps in the sequence and T.sub.AT and T.sub.GC are the average melting temperatures of A.T (T.A) and G.C (C.G) bps. Values of T.sub.AT or T.sub.GC evaluated from melting curve analysis of a variety of DNAs collected as a function of solvent environment provide the dependence of t.sub.m on solvent ionic strength. The dependence of T.sub.AT and T.sub.GC on [Na.sup.+ ].sup.22 was first reported by M.D. Frank-Kamenetski (Biopolymers, 10, 2623-24 (1971)). EQU T.sub.AT =355.55+7.95 ln [Na.sup.+ ] (2a) EQU T.sub.GC =391.55+4.89 ln [Na.sup.+ ] (2b)
.DELTA.S.sub.AT and .DELTA.S.sub.GC in eqn (1) are the average entropy changes associated with melting A.T or G.C bps. Calorimetric and spectrophotometric melting studies of long DNA polymers of natural and synthetic origins have revealed the transition entropies of melting A.T and G.C bps are virtually independent of bp type (A.T or G.C), temperature, and only weakly dependent on solvent ionic strength over reasonable limits (15 mM to 1.0M NaCl). Assuming only three preferred conformations are available for each nucleotide residue per bp, the transition entropy in forming a helix can be written as: EQU .DELTA.S=-2(6R. ln 3)=-26.2 cal/K.mol (3)
Coincidentally, this value is almost precisely the entropy of base pair formation, .DELTA.S=-24.85.+-.1.84 cal/K.mole, determined from the studies mentioned above. Thus, .DELTA.S.sub.AT =.DELTA.S.sub.GC =.DELTA.S can be determined from the ratio: EQU .DELTA.H.sub.AT /T.sub.AT =.DELTA.H.sub.GC T.sub.GC=.DELTA. S(4)
where .DELTA.H.sub.AT and .DELTA.H.sub.GC are enthalpy changes in melting A.T or G.C bps. Calorimetric and spectrophotometric melting studies of short duplex oligomers six to eight bps in length have revealed a sequence dependence of the melting entropy (K. J. Breslauer, et al., Proc. Nat. Acad. Sci. USA, 83, 3746-50 (1986)).
The values of the bp transition enthalpies, .DELTA.H.sub.AT and .DELTA.H.sub.GC, are also dependent on solvent ionic strength. Empirically derived equations for their determination in different Na.sup.+ environments have also been reported. S. A. Kozyaukin, et al., J. Biomol. Struct. Dynam., 5, 119-26 (1987). EQU .DELTA.H.sub.AT =-9300-456.01 ln [Na.sup.+ ] (5)
From eqns (2b) and (4), .DELTA.H.sub.GC can be determined. Therefore, if DNA is considered to be comprised of only two energetic components, the free-energy can be determined from the sequence by substitution of the appropriate values from eqns. 2, 4 and 5 in eqn. 1.
During the mid-70's substantial quantities of homogeneously pure DNA samples were available. In addition, spectrophotometric instrumentation allowed automated collection of melting curve data with increased resolution. These developments made possible the discovery of multi-model melting or "fine-structure" on optical melting transitions of heterogeneous-sequence DNA fragments. Such fine structure was attributed to sequential melting of large DNA domains. Failure of simple two-component melting theories to accurately predict the observed fine structure in DNA melting curves suggested a role for both sequence heterogeneity and sequence type in the transition.
The potential for bound ligands to affect the structure of unbound flanking DNA sequences has been recognized for some time (reviewed by D. M. Crothers and M. Fried, Cold. Spring Harbor Symposia Quant. Biol., 47, 263-69 (1983)). Foot-printing methodology has been applied to detect unbound, but structurally perturbed regions flanking a ligand binding site. The location of actinomycin D binding was monitored by the inaccessibility of DNAseI to. DNA within the drug binding site. M. Lane, et al., Proc. Nat. Acad. Sci USA, 80, 3260-64 (1983); C. M. L. Low, et al., Nucl. Acids Res., 12, 4865-79 (1984). Structural perturbations imparted to flanking DNA sequences by the bound drug were simultaneously monitored as enhanced DNaseI cleavage rates at immediately flanking sequence positions not sterically occluded from DNaseI by bound drug. Although the potential of DNA structural distortions at regions within the drug footprint exist, the footprinting approach cannot detect such distortions since these regions are protected from cleavage. When intercalated at its dinucleotide site in a linear molecule, actinomycin D can affect flanking DNA structure in a linear DNA molecule over considerable distances albeit with sequence dependence. Further corroboration that drug induced DNaseI detected enhancements were structural in origin was independently obtained from proton NMR experiments of d[(AAATATAGCTATATTT).sub.2 ] SEQ. ID NO: 1) complexed with actinomycin D. K. D. Bishop et al., Nucl. Acids Res., 19, 871-75 (1991).
Restriction enzymes cleave duplex DNA at specific nucleotide sequences. The sequences flanking a restriction enzyme recognition site can influence the rate of restriction enzyme cleavage at the site. M. C. Aloyo, et al., Biophys. J., 64, A280 (1993). Such effects occur while cleaving P4 phage DNA with the restriction enzyme EcoRI, suggesting that differences in DNA sequences flanking EcoRI sites account for observed differences in rates of cleavage. Goldstein, et al., Virology, 66, 420-427 (1975). A large body of data regarding the sequence-dependent behavior of various restriction enzymes has appeared. Armstrong and Bauer, Nucl. Acids Res., 11, 4109-4126 (1983), and Alves, et al., Eur. J. Biochem., 140, 83-92 (1984), disclosed cleavage rate variations for the enzymes EcoRI, HinfI, and PstI, finding that the activities of all three enzymes could be inhibited by long runs of GC-rich sequences placed immediately flanking the restriction sites. Concerning effects of flanking DNA sequence on cleavage by enzymes FnuDII, HaeIII, HhaI and MspI, Drew and Travers, Nucl. Acids Res., 13, 4445-4456 (1985), observed that cleavage rates for these enzymes exhibit a dependence on flanking sequence; noting that the effect "though clearly evident, was complex and varied."
Variations in rates of restriction enzyme cleavage have also been shown to be dependent on DNA substrate length. Thus, the rate of cleavage at a specific site depends directly on the length of DNA flanking the specific site. Richter and Eigen, Biophys. Chem., 2, 255-263 (1974); Berg, et al., Biochemistry, 20, 6929-6948 (1981).
The present invention offers the potential to develop highly accurate protocols using DNA amplification strategies (such as those based on the polymerase chain reaction) for the diagnosis of disease states caused by viral DNA and difficult to determine with high certainty by any known method in the art, in part because of significant analytical difficulties in reliably detecting the identity of the related DNA sequences at ultralow levels. An important example of a DNA disease virus is human immunodeficiency virus, wherein false positives can have serious psychological and social consequences.