1. Field of the Invention
The present invention relates generally to the fields of molecular biology and nucleic acid analysis. More specifically, the present invention provides a novel computer program for the design of optimized sets of oligonucleotide probes for microarrays.
2. Description of the Related Art
Genosensors, also called oligonucleotide microarrays or “DNA chips,” are miniature devices containing arrays of oligonucleotide probes tethered to a surface (Beattie, 1997a). By hybridizing target nucleic acid molecules to these arrays and analyzing the resultant hybridization patterns, comparative analysis of sequences can be conducted (Beattie, 1997b), such as detection of specific mutations, identification of microorganisms (Beattie, 1997a), profiling of gene expression (Duggan et al., 1999), and verification of sequencing data (Hacia, 1999). For any given DNA or RNA sequence a large number of potential probes could be derived; however, only a small subset is needed to manifest the desired characteristics of the analyte nucleic acid molecules. In order to design probes for successful use in genosensors it is necessary to minimize the probability of unspecific (mismatched) hybridization between the probe and any nucleic acid sequence other than the intended target site (Doktycz and Beattie, 1997). A computer program is needed to design probes that are useful in microarray analysis.
A computer program called Genosensor Probe Designer (GPD) is disclosed herein, which can be used for selecting the most suitable probes for a genosensor chip based upon several factors that could affect the hybridization process. These factors include thermal stability, secondary structure, and alternative binding sites within the nucleic acid analyte.
It is well known that thermal stability of duplex nucleic acids depends on nucleotide sequence, chain length and nucleic acid concentration, as well as the identity and concentration of counterions. It is possible to find optimal hybridization conditions for specific binding of any given probe with its target molecule, but when the hybridization reaction is carried out with numerous probes and target molecules (as with genosensor chips), a loss in specificity can occur. The loss occurs particularly if the thermal stabilities of arrayed probes paired with their target sequences vary widely, or if the complexity of the analyte nucleic acid is sufficiently high to present alternative, mismatch-containing hybrids. Thus, the hybridization of multiple probes with a nucleic acid analyte can produce signals that are partially or completely due to imperfectly matched hybrids (Doktycz and Beattie, 1997). This kind of ambiguous hybridization signal depends on the sequence and the identity of the non-paired bases. A complete understanding of the thermal stability of hybrids formed between probes and nucleic acid molecules requires information about the energetic contributions for all the possible interactions that can take part in the hybridization process (Doktycz and Beattie, 1997).
Furthermore, the target DNA or RNA molecules are capable of forming stable secondary structures that can make some target sequences inaccessible to hybridization with the complementary oligonucleotide probes. Moreover, large targets are also likely inhibited sterically from approaching the surface of the array (Southern et al., 1999). In order to avoid these problems, several approaches can be followed. If a reasonable prediction of the secondary structure of the target could be made, probes could be selected from regions that are not tied up in secondary structure. Effects of secondary structure could be reduced by fragmenting the nucleic acid preferably to a size close to that of the oligonucleotides on the array (Southern et al., 1999). Also, strategies of annealing with auxiliary oligonucleotides (tandem hybridization) have been proposed to eliminate interfering secondary or higher-order structures or to cover up unwanted (redundant) hybridization sites within the target DNA (Maldonado-Rodriguez et al., 1999a, 1999b; Maldonado-Rodriguez and Beattie, 2001). Finally, when genosensor chips are used to reveal differences between closely related nucleic acid sequences, the probes must be selected to specifically identify a particular sequence. In this case probes must be selected from regions with sufficient sequence variability to minimize nonspecific hybridization with related molecules. On the other hand, when probes are required for identifying a group of similar sequences, probes must be selected from conserved regions.
Several works dealing with nucleic acid sequence analysis and oligonucleotide probe design have been published previously (Bushnell et al., 1999; Galper et al., 1993; Shütz and von Ahsen, 1999; Vahrson et al., 1996; Li and Stormo, 2001; Pozhitkov and Tautz, 2002). One interesting work is Vahrson's library, called SCL-a, which is a C++ Object-Oriented library similar in some respects to that disclosed herein. Vahrson's library is specialized in the management of dynamical memory for manipulating long DNA sequences, whereas the library disclosed herein is specialized in the calculation of thermodynamic stability and the search for potential hybridization sites.
Object-Oriented support included in the Object Pascal library of Delphi is similar to that provided in C++. Classes are similar between Delphi and C++ programming languages; however, Object-Pascal language has a clearer syntax than that used by C++, and Delphi code can be easily translated to C++ if required with minimal complexity. Moreover, Delphi compiled native programs can run faster than those produced using C++ compilers.
A spreadsheet software program for thermodynamic melting point prediction of oligonucleotide hybridization based in the NN model has recently been developed (Shütz and von Ahsen, 1999). However, this program does not predict the specific hybridization patterns that could be expected with a given set of probes, and does not design sets of probes for the variety of Genosensor applications that are described herein.
Also a program for selection of optimal DNA probes for gene expression arrays has been published recently (Li and Stormo, 2001). Although this program uses criteria for selection of probes similar to those implemented in the software that is described in the present work, it is intended to select relatively long probes (more than 20 bases long) which are less convenient for single mutation discrimination. Also, an algorithm and program for selecting specific probes for species identification with microarrays has been published recently (Pozhitkov and Tautz, 2002). This algorithm considers position of mismatches which influences the selection; however, information is lacking about the experimental performance of the probes selected with this program.
Thus, in order to identify conserved or variable regions the complete alignment of the sequences under study must be conducted prior to selection of the most appropriate target regions for the subsequently designed probes. Consequently, in the design of optimized sets of oligonucleotide probes for nucleic acid analysis on genosensor arrays, careful consideration of numerous factors must be done, including the characteristics of the nucleic acid analyte, the type of analysis being performed, thermal stability of probe-target duplexes, secondary structure within the target sequence, and alternative probe binding sites within the target nucleic acid. The Genosensor Probe Designer software disclosed herein takes all of these factors into consideration.