1. Field of the Invention
Significant morbidity and mortality are associated with infectious diseases and genetically inherited disorders. More rapid and accurate diagnostic methods are required for better monitoring and treatment of these conditions. Molecular methods using DNA probes, nucleic acid hybridization and in vitro amplification techniques are promising methods offering advantages to conventional methods used for patient diagnoses.
Nucleic acid hybridization has been employed for investigating the identity and establishing the presence of nucleic acids. Hybridization is based on complementary base pairing. When complementary single stranded nucleic acids, are incubated together, the complementary base sequences pair to form double-stranded hybrid molecules. The ability of single stranded deoxyribonucleic acid (ssDNA) or ribonucleic acid (RNA) to form a hydrogen bonded structure with a complementary nucleic acid sequence has been employed as an analytical tool in molecular biology research. The availability of radioactive nucleoside triphosphates of high specific activity and the development of methods for their incorporation into DNA and RNA has made it possible to identify, isolate, and characterize various nucleic acid sequences of biological interest. Nucleic acid hybridization has great potential in diagnosing disease states associated with unique nucleic acid sequences. These unique nucleic acid sequences may result from genetic or environmental change in DNA by insertions, deletions, point mutations, or by acquiring foreign DNA or RNA by means of infection by bacteria, molds, fungi, and viruses. The application of nucleic acid hybridization as a diagnostic tool in clinical medicine is limited due to the cost and effort associated with the development of sufficiently sensitive and specific methods for detecting potentially low concentrations of disease-related DNA or RNA present in the complex mixture of nucleic acid sequences found in patient samples.
One method for detecting specific nucleic acid sequences generally involves immobilization of the target nucleic acid on a solid support such as nitrocellulose paper, cellulose paper, diazotized paper, or a nylon membrane. After the target nucleic acid is fixed on the support, the support is contacted with a suitably labeled probe nucleic acid for about two to forty-eight hours. After the above time period, the solid support is washed several times at a controlled temperature to remove unhybridized probe. The support is then dried and the hybridized material is detected by autoradiography or by spectrometric methods. When very low concentrations must be detected, the above method is slow and labor intensive, and nonisotopic labels that are less readily detected than radio labels are frequently not suitable.
A method for the enzymatic amplification of specific segments of DNA known as the polymerase chain reaction (PCR) method has been described. This in vitro amplification procedure is based on repeated cycles of denaturation, oligonucleotide primer annealing, and primer extension by thermophilic polymerase, resulting in the exponential increase in copies of the region flanked by the primers. The PCR primers, which anneal to opposite strands of the DNA, are positioned so that the polymerase catalyzed extension product of one primer can serve as a template strand for the other, leading to the accumulation of a discrete fragment whose length is defined by the distance between the 5' ends of the oligonucleotide primers.
Other methods for amplifying nucleic acids have also been developed. These methods include single primer amplification, ligase chain reaction (LCR), transcription-mediated amplification methods including 3SR and NASBA, and the Q-beta-replicase method. Regardless of the amplification used, the amplified product must be detected.
One method for detecting nucleic acids is to employ nucleic acid probes that have sequences complementary to sequences in the target nucleic acid. A nucleic acid probe may be, or may be capable of being, labeled with a reporter group or may be, or may be capable of becoming, bound to a support. Detection of signal depends upon the nature of the label or reporter group. Usually, the probe is comprised of natural nucleotides such as ribonucleotides and deoxyribonucleotides and their derivatives although unnatural nucleotide mimetics such as peptide nucleic acids and oligomeric nucleoside phosphonates are also used. Commonly, binding of the probes to the target is detected by means of a label incorporated into the probe. Alternatively, the probe may be unlabeled and the target nucleic acid labeled. Binding can be detected by separating the bound probe or target from the free probe or target and detecting the label. In one approach, a sandwich is formed comprised of one probe, which may be labeled, the target and a probe that is or can become bound to a surface. Alternatively, binding can be detected by a change in the signal-producing properties of the label upon binding, such as a change in the emission efficiency of a fluorescent or chemiluminescent label. This permits detection to be carried out without a separation step. Finally, binding can be detected by labeling the target, allowing the target to hybridize to a surface-bound probe, washing away the unbound target and detecting the labeled target that remains.
Direct detection of labeled target hybridized to surface-bound probes is particularly advantageous if the surface contains a mosaic of different probes that are individually localized to discrete, known areas of the surface. Such ordered arrays containing a large number of oligonucleotide probes have been developed as tools for high throughput analyses of genotype and gene expression. Oligonucleotides synthesized on a solid support recognize uniquely complementary nucleic acids by hybridization, and arrays can be designed to define specific target sequences, analyze gene expression patterns or identify specific allelic variations. One difficulty in the design of oligonucleotide arrays is that oligonucleotides targeted to different regions of the same gene can show large differences in hybridization efficiency, presumably due, at least in part, to the interplay between the secondary structures of the oligonucleotides and their targets and the stability of the final probe/target hybridization product. A method for predicting which oligonucleotides will show detectable hybridization would substantially decrease the number of iterations required for optimal array design and would be particularly useful when the total number of oligonucleotide probes on the array is limited. A method to predict oligonucleotide hybridization efficiency would also streamline the empirical approaches currently used to select potential antisense therapeutics, which are designed to modulate gene expression in vivo by hybridizing to specific messenger RNA (mRNA) molecules and inhibiting their translation into proteins.
While it is well known that the structure of the target nucleic acid affects the affinity of oligonucleotide hybridization, current methods for predicting target structures from the primary sequence fail to predict target regions accessible for oligonucleotide binding. Consequently, selection of oligonucleotides for antisense reagents or oligonucleotide probe arrays has been largely empirical. As most of the target sequence is sequestered by intramolecular base pairing and not accessible for oligonucleotide binding, the process of identifying good oligonucleotides has required large numbers of low efficiency experiments.
The design and implementation of algorithms that effectively predict the ability of oligonucleotides to rapidly and avidly bind to complementary nucleotide sequences has been an important problem in molecular biology since the invention of facile methods for chemical DNA synthesis. The subsequent inventions of the polymerase chain reaction (PCR), antisense inhibition of gene expression and oligonucleotide array methods for performing massively parallel hybridization experiments have made the need for effective predictive algorithms even more critical.
Previous attempts to solve the nucleic acid probe design problem include PCR primer design software applications (e.g., OLIGO.RTM.), neural networks, PCR primer design applications that search for sequences that possess minimal ability to cross-hybridize with other targets present in a sample (e.g., HYBsimulator.TM.), and approaches that attempt to predict the efficiency of antisense sequence suppression of mRNA translation from a combination of predicted nucleic acid duplex melting temperature and predicted target strand structure. The methods that predict effective oligonucleotide primers for performing PCR from DNA templates work well for that application where relatively stringent conditions are employed. This is because PCR experimental design greatly simplifies the prediction problem: hybridization is performed at high temperature, at relatively low ionic strength and in the presence of a large molar excess of oligonucleotide. Under these conditions, the oligonucleotide and target secondary structures are relatively unimportant.
Unfortunately, these conditions do not apply to oligonucleotide arrays, which are usually hybridized under relatively non-denaturing conditions, or to antisense suppression of gene expression, which takes place in vivo. Oligonucleotide arrays can contain hundreds of thousands of different sequences and conditions are chosen to allow the oligonucleotide with the lowest melting temperature to hybridize efficiently. These "lowest common denominator" conditions are usually relatively non-denaturing and secondary structure constraints become significant. Accordingly, the above applications require new predictive methods that are capable of estimating the effects of oligonucleotide and target structure on hybridization efficiency. For these reasons, current algorithms for designing PCR primer oligonucleotides fail badly when applied to the problems of oligonucleotide array or antisense oligonucleotide design.
To date, the most effective approach for identifying oligonucleotides with good hybridization efficiency has been an empirical one. Such an approach involves the synthesis of large numbers of oligonucleotide probes for a given target nucleotide sequence. Arrays are formed that include the above oligonucleotide probes. Hybridization experiments are carried out to determine which of the oligonucleotide probes exhibit good hybridization efficiencies. Examples of such an approach are found in D. Lockhart, et al., Nature Biotech., infra, L. Wodicka, et al., Nature Biotechnology, infra., and N. Milner et al. Nature Biotech, infra. One major drawback to this approach is the vast number of oligonucleotides that must be synthesized in order to achieve a satisfactory result. Typically, about 2%-5% of the test probes synthesized yield acceptable signal levels.
The use of neural networks for oligonucleotide design has also been investigated. Neural networks are easily taught with real data; they therefore afford a general approach to many problems. However, their performance is limited by the "senses" that they are given. An analogy works best here: the human brain is an astoundingly capable neural network, but a blind person cannot be taught to reliably distinguish colors by smell. In addition, a large amount of data is required to adequately teach a neural network to perform its job well. A comprehensive database for either oligonucleotide array design or antisense suppression of gene expression has not been made available. For these reasons, the performance reported to-date of neural network solutions against the probe design problem is mediocre.
Finally, approaches that have attempted to use target nucleic acid folding calculations to predict experimental results inferred to depend upon hybridization efficiency (e.g. antisense suppression of mRNA translation) have so far only demonstrated that the predictions of current nucleic acid folding calculations correlate poorly with observed behavior. The probable reason for this is that the structures predicted by such programs for long sequences are poor predictors of chemical reality; the results of experiments that attempt to confirm the predictions of such calculations support this assessment. Recent improvements to this approach which use predicted RNA structure topology as a predictor of relative RNA/RNA association kinetics have been more successful at forecasting the results of antisense experiments. However, these methods are not computationally efficient, and have so far only been shown to work for targets less than 100 bases long. Such methods are therefore not yet capable of predicting the behavior of full-length mRNA targets, which are typically between 1,000 and 2,000 bases in length.
2. Description of the Related Art
U.S. Pat. No. 5,512,438 (Ecker) discloses the inhibition of RNA expression by forming a pseudo-half knot RNA at the target's RNA secondary structure using antisense oligonucleotides.
Cook, et al., in U.S. Pat. No. 5,670,633 discuss sugar-modified oligonucleotides that detect and modulate gene expression.
Antisense oligonucleotide inhibition of the RAS gene is disclosed in U.S. Pat. No. 5,582,986 (Monia, et al.).
U.S. Pat. No. 5,593,834 (Lane, et al.) discusses a method of preparing DNA sequences with known ligand binding characteristics.
Mitsuhashi, et al., in U.S. Pat. No. 5,556,749 discusses a computerized method for designing optimal DNA probes and an oligonucleotide probe design station.
U.S. Pat. No. 5,081,584 (Omichinski, et al.) discloses a computer-assisted design of anti-peptides based on the amino acid sequence of a target peptide.
A PCR primer design application that searches for sequences that possess minimal ability to cross-hybridize with other targets present in a sample is available as HYBsimulatorm.TM., version 2.0, AGCT, Inc., 2102 Business Center Drive, Suite 170, Irvine, Calif. 92715 (714) 833-9983.
A PCR primer design software application is available as OLIGO.RTM., version 5.0, National Biosciences, Inc., 3650 Annapolis Lane North, #140, Plymouth, Minn. 55447 (800) 747-4362.
D. J. Lockhart, et al., Nature Biotech. 14:1675-1684 (1996) describe a neural network approach to the selection of efficient surface-bound oligonucleotide probes.
M. Mitsuhashi, etal., Nature, 367:759-761 (1994) disclose a method for designing specific oligonucleotide probes and primers by modeling the potential cross-hybridization of candidate probes to non-target sequences known to be present in samples.
R. A. Stull, et al., Nuc. Acids Res., 20:3501-3508 (1992) describe a method of predicting the efficacy of antisense oligonucleotides, using predicted target secondary structure and predicted oligonucleotide/target binding free energy as input parameters.
N. Milner, et al., Nature Biotechnology, 15:537-541 (1997) compare observed patterns of probe hybridization to those expected from the predicted secondary structure of the nucleic acid target.
L. Wodicka, et al., Nature Biotechnology, 15:1359-1367 (1997) describe simple rules for avoiding inefficient and non-specific probes during design and synthesis of oligonucleotides arrays.
J. SantaLucia Jr., et al., Biochemistry, 35:3555 (1996) disclose parameters and methods for the calculation of thermodynamic properties of DNA/DNA homoduplexes.
N. Sugimoto, et al., Biochemistry, 34:11211 (1995) disclose parameters and methods for the calculation of thermodynamic properties of DNA/RNA heteroduplexes.
J. A. Jaeger, et al., Proc. Natl. Acad. Sci. USA, 86:7706 (1989) disclose methods for estimation of the free energy of the most stable intramolecular structure of a single-stranded polynucleotide, by means of a dynamic programming algorithm.
S. F. Altschul, et al., Nature Genetics, 6:119-129 (1994) disclose methods for calculating the complexity and information content of amino acid and nucleic acid sequences.
T. A. Weber and E. Helfand, J. Chem. Phys., 71, 4760 (1979) describe approaches for the modeling of polymer structures by molecular dynamics simulations.
V. Patzel and G. Sczakiel, Nature Biotech., 16, 64-68 (1998) disclose methods for estimating rate constants for association of antisense RNA molecules with mRNA targets by examination of predicted antisense RNA secondary structures.
Light-generated oligonucleotide arrays for rapid DNA sequence analysis is described by A. C. Pease, et al., Proc. Nat. Acad. Sci. USA (1994) 91:5022-5026.
Mitsuhashi discusses basic requirements for designing optimal oligonucleotide probe sequences in J. Clinical Laboratory Analysis (1996) 10:277-284.
Rychlik, et al., discloses a computer program for choosing optimal oligonucleotides for filter hybridization, sequencing and in vitro amplification of DNA in Nucleic Acids Research (1989) 17(21):8543-8551.
A strategy for designing specific antisense oligonucleotide sequences is described by Mitsuhashi in J. Gastroenterol. (1997) 32:282-287.
Mitsuhashi discusses basic requirements for designing optimal PCR primers in J. Clinical Laboratory Analysis (1996) 10:285-293.
Hyndman, et al., disclose software to determine optimal oligonucleotide sequences based on hybridization simulation data in BioTechniques (1996) 20(6):1090-1094.
Eberhardt discloses a shell program for the design of PCR primers using genetics computer group (GCG) software (7.1) on VAX/VMS.TM. systems in BioTechniques (1992) 13(6):914-917.
Chen, et al., disclose a computer program for calculating the melting temperature of degenerate oligonucleotides used in PCR or hybridization in BioTechniques (1997) 22(6):1158-1160.
Partial thermodynamic parameters for prediction stability and washing behavior of DNA duplexes immobilized on gel matrix is described by Kunitsyn, et al., in J. Biomolecular Structure & Dynamics, ISSN 0739-1102 (1996) 14(1):239-244.