The present invention relates to computational methodologies for designing hybridization assays, polymerase chain reaction amplifications, and anti-sense drugs and, in particular, to a computational method and system for predicting the hybridization potential of a probe molecule/target molecule pair.
The present invention relates to computationally predicting the stability of non-covalent binding between a probe molecule and a target molecule. The current application will specifically address hybridization of deoxyribonucleic acid (xe2x80x9cDNAxe2x80x9d) polymers, although the techniques and methodologies described in the current application may be applied to DNA and ribonucleic acid (xe2x80x9cRNAxe2x80x9d) hybridization, RNA/RNA hybridization, hybridization of various synthetic polymers, and hybridization of other types of polymer molecules.
DNA molecules are linear polymers, synthesized from only four different types of subunit molecules: (1) deoxy-adenosine, abbreviated xe2x80x9cA,xe2x80x9d a purine nucleoside; (2) deoxy-thymidine, abbreviated xe2x80x9cT,xe2x80x9d a pyrimidine nucleoside; (3) deoxy-cytosine, abbreviated xe2x80x9cC,xe2x80x9d a pyrimidine nucleoside; and (4) deoxy-guanosine, abbreviated xe2x80x9cG,xe2x80x9d a purine nucleoside. FIG. 1 illustrates a short DNA polymer 100, called an oligomer, composed of the following subunits: (1) deoxy-adenosine 102; (2) deoxy-thymidine 104; (3) deoxy-cytosine 106; and (4) deoxy-guanosine 108. When phosphorylated, subunits of the DNA molecule are called nucleotides, and are linked together through phosphodiester bonds 110-115 to form the DNA polymer. A linear DNA, such as the oligomer shown in FIG. 1, molecule has a 5xe2x80x2 end 118 and a 3xe2x80x2 end 120. A DNA polymer can be chemically characterized by writing, in sequence from the 5xe2x80x2 end to the 3xe2x80x2 end, the single letter abbreviations for the nucleotide subunits that together compose the DNA polymer. For example, the oligomer 100 shown in FIG. 1 can be chemically represented as xe2x80x9cATCG.xe2x80x9d A nucleotide comprises a purine or pyrimidine base (e.g. adenine 122 of the deoxy-adenylate nucleotide 102), a deoxy-ribose sugar (e.g. ribose 124 of the deoxy-adenylate nucleotide 102), and a phosphate group (e.g. phosphate 126) that links the nucleotide to the next nucleotide in the DNA polymer. RNA polymers are similar to DNA polymers, except that 2xe2x80x2-hydrogens, such as 2xe2x80x2-hydrogens 126, are replaced with hydroxyl groups and the pyrimidine uridine replaces the pyrimidine thymine, where the 5xe2x80x2-methyl group of thymine is replaced by a hydrogen in uridine.
The DNA polymers that contain the organizational information for living organisms occur in the nuclei of cells in pairs, forming double-stranded DNA helices. One polymer of the pair is laid out in a 5xe2x80x2 to 3xe2x80x2 direction, and the other polymer of the pair is laid out in a 3xe2x80x2 to 5xe2x80x2 direction. The two DNA polymers in a double-stranded DNA helix are therefore described as being anti-parallel. The two DNA polymers, or strands, within a double-stranded DNA helix are bound to each other through attractive forces including hydrophobic interactions between stacked purine and pyrimidine bases and hydrogen bonding between purine and pyrimidine bases, the attractive forces emphasized by conformational constraints of DNA polymers. Because of a number of chemical and topographic constraints, double-stranded DNA helices are most stable when deoxy-adenylate subunits of one strand hydrogen bond to deoxy-thymidylate subunits of the other strand, and deoxy-guanylate subunits of one strand hydrogen bond to a deoxy-cytidylate subunits of the other strand.
FIGS. 2A-B illustrate the hydrogen bonding between purine/pyrimidine base pairs of two anti-parallel DNA strands. FIG. 2A shows hydrogen bonding between an adenine and a thymine, and FIG. 2B shows hydrogen bonding between a guanine and a cytosine. Note that there are two hydrogen bonds 202 and 203 in the adenine/thymine base pair, and three hydrogen bonds 204-206 in the guanine/cytosine base pair, as a result of which GC base pairs contribute greater thermodynamic stability to DNA duplexes than AT base pairs. AT and GC base pairs, illustrated in FIGS. 2A-B, are known as Watson-Crick (xe2x80x9cWCxe2x80x9d) base pairs.
Two DNA strands linked together by hydrogen bonds form the familiar helix structure of a double-stranded DNA helix. FIG. 3 illustrates a short section of a DNA double helix 300 comprising a first strand 302 and a second, anti-parallel strand 304. The ribbon-like strands in FIG. 3 represent the deoxyribose and phosphate backbones of the two anti-parallel strands, with hydrogen-bonded purine and pyrimidine base pairs, such as base pair 306, interconnecting the two strands. Deoxy-guanylate subunits in one strand are generally paired with deoxy-cytidylate subunits in the other strand, and deoxy-thymidylate subunits in one strand are generally paired with deoxy-adenylate subunits in the other strand. However, non-WC base pairings may occur within double-stranded DNA. Generally, purine/pyrimidine non-WC base pairings contribute little to the thermodynamic stability of a DNA duplex, but generally do not destabilize a duplex otherwise stabilized by WC base pairs. Such base pairs are referred to below as xe2x80x9cnon-WCxe2x80x9d base pairs. However, purine/purine base pairs may destabilize DNA duplexes, as may, to a lesser extent, pyrimidine/pyrimidine base pairs. Such base pairings are referred to below as xe2x80x9canti-WCxe2x80x9d base pairs.
Double-stranded DNA may be denatured, or converted into single-stranded DNA, by changing the ionic strength of the solution containing the double-stranded DNA or by raising the temperature of the solution. Single-stranded DNA polymers may be renatured, or converted back into DNA duplexes, by reversing the denaturing conditions, for example by lowering the temperature of the solution containing the single-stranded DNA polymers. During the renaturing process, complementary bases of anti-parallel strands form WC base pairs in a cooperative fashion, leading to regions of DNA duplex. However, many different types of associations between and within DNA polymers may occur that may lead to many different types of mismatching between single strands of DNA. In general, the longer the regions of consecutive WC base pairing between two single strands of DNA, the greater the stability of hybridization of the two polymers under renaturing conditions.
The ability to denature and re-nature double-stranded DNA has led to development of many extremely powerful and discriminating assay technologies for identifying the presence of single-stranded or double-stranded DNA of particular base sequences or containing particular sub-sequences within complex mixtures of different DNA polymers and other bio-polymers and chemical substances. These methodologies include the polymerase chain reaction (xe2x80x9cPCRxe2x80x9d), molecular-array-based hybridization assays, fluorescent in situ hybridization (xe2x80x9cFISHxe2x80x9d), and anti-sense nucleic acid bio-polymers that may be used as therapeutic agents or research tools to block expression of particular genes within an organism. FIG. 4 illustrates probe/target hybridization that underlies hybridization assays. A probe 402 is synthesized to contain a short single-stranded DNA of a particular sequence 404 attached to a second chemical entity 406 represented in FIG. 4 as an unfilled disk. The nature of the second chemical entity 406 varies depending on the technique in which the probe is employed. The probe is brought into contact with a solution of single-stranded DNA polymers 408-412 having different sequences. The solution is then modified to become a renaturing environment, for example by changing the ionic strength of the solution or lowering the temperature of the solution, to allow for hybridization of the single-stranded-DNA portion of the probe molecules 404 with single-stranded DNA molecules having complementary sequences 410. Thus, a probe molecule, in figurative terms, fishes out a single-stranded DNA polymer having a complementary or near-complementary sequence from a complex solution of DNA molecules having non-complementary sequences and perhaps other chemical entities, such as other biopolymers and organic compounds.
In molecular-array-based hybridization assays, the DNA polymer portion 404 of a probe is covalently attached to a solid substrate 406. Probes having different DNA sequences are synthesized on different regions of the surface of the substrate. The single-stranded DNAs that hybridize to the substrate-bound probe molecules are chemically modified to contain fluorophores, chemophores, or radioisotopes. Following hybridization of the modified single-stranded DNA polymers to probes, the unhybridized single-stranded DNA polymers are washed from the surface of the molecular array, and bound sample DNA polymers are detected by spectroscopy or radiography. Using molecular-array-based assays, many hundreds or thousands of different types of probes having different, known DNA sequences can be concurrently employed to hybridize to sample DNA molecules to detect and quantify the presence of DNA polymers containing sequences complementary to the probe sequences in fantastically complex solutions of DNA polymers generated, for example, by extracting active messenger RNA (xe2x80x9cmRNAxe2x80x9d) from cells of living organisms.
In the FISH technique, a probe contains an oligonucleotide having a specific sequence 404 bound to a fluorophore 406. The probes are introduced into a biological sample under renaturing conditions and then detected via fluoroscopy, allowing the locations of DNA polymers having subsequences complementary to the probe sequence to be visualized. In the PCR technique, oligonucleotides with specific sequences are hybridized to single-stranded DNA and the oligonucleotides are synthetically extended via the DNA polymerase reaction. Many cycles of the PCR technique are used to amplify DNA sequences flanked by contiguous sequences complementary to the oligonucleotide primer. Anti-sense drugs are DNA or RNA polymers with specific sequences designed to bind to complementary sequences within an organism""s DNA or mRNA in order to block expression of genes containing the sequences or controlled by control regions containing the sequences.
While hybridization assays and techniques such as PCR and anti-sense drugs have the potential for exquisite discrimination and selectivity in hybridizing probe DNA sequences to target complimentary subsequences of single-stranded DNAs, there is also a large potential for unwanted and less discriminating hybridization that can greatly decrease the selectivity and discrimination of a particular assay. FIGS. 5A-C illustrate various types of undesirable cross-hybridization reactions between probe molecules and sample molecules that can decrease the selectivity and signal-to-noise ratio of hybridization assays and decrease the selectivity of PCR and anti-sense methodologies. Hybridization of a subsequence of the probe oligonucleotide sequence with a complimentary subsequence of a sample DNA polymer is illustrated in FIG. 5A. The probe 502 in FIG. 5A is synthesized to hybridize with, and select sample molecules containing, the subsequence TCGCTACGGAT. However, as shown in FIG. 5A, a sample molecule 504 containing the subsequence CGCTA 506 hybridizes to the complimentary subsequence TAGCG 508 of the probe molecule. Thus, the probe has hybridized with a sample molecule that does not contain a full complimentary sequence to the oligonucleotide sequence of the probe molecule. Such subsequence hybridization greatly diminishes the selectivity of the probe for particular sequences. In the current example, the 11-subunit sequence of the probe is one out of 411 possible 11-subunit oligonucleotide sequences, whereas the subsequence TAGCG is merely one out of 45 possible 5-subunit oligonucleotide sequences. Obviously, 1 out of 411 represents far greater selectivity than 1 out of 45. Thus, in addition to target single-stranded DNAs containing the full complementary sequence TCGATACGGAT, many unwanted sample DNA polymers containing the complementary subsequence CGCTA, and many other subsequences of the probe oligonucleotide sequence, may hybridize to probe molecules having the sequence ATCCGTAGCGA. In the following, a single stretch of consecutive WC base pairings between probe and target molecules that does not encompass the entire probe sequence is referred to as a xe2x80x9cfragment.xe2x80x9d
In addition to subsequence cross-hybridization illustrated in FIG. 5A, various types of fragmented cross-hybridizations are possible. FIG. 5B illustrates an omega two-fragment cross-hybridization. In FIG. 5B, a first subsequence of an unwanted sample molecule 510 having the sequence TCGC is hybridized to the terminal subsequence 512 GCGA of the probe molecule, and a second subsequence 514 GGAT of the unwanted sample DNA polymer has hybridized to the initial subsequence 516 ATCC of the probe molecule. The two hybridizing subsequences 510 and 514 of the unwanted sample DNA molecule are separated by a loop 518 of DNA.
FIG. 5C illustrates a two-fragment delta cross-hybridization. The unwanted sample DNA 520 in FIG. 5C has the same two complementary subsequences 522 and 524 as subsequences 510 and 514 of unwanted sample DNA polymer 518 in FIG. 5B, hybridized to the same probe subsequences 526 and 528 as unwanted sample DNA molecule 518 hybridized to in FIG. 5B. However, the orientations of the unwanted sample DNA subsequences 522 and 524 are opposite to the orientations of the subsequences 510 and 514, respectively, in unwanted sample DNA molecule 518 of FIG. 5B. Hybrid cross-hybridizations and cross-hybridizations comprising more than two fragments are also possible.
In order to prevent cross-hybridizations in hybridization-based assays, in the PCR technique, and in designing anti-sense polymers, researchers and manufacturers strive to create probe molecules containing oligonucleotide sequences with high specificity for particular target molecules and with low cross-hybridization potential for other non-target polymers that may be present in the samples to which the probe molecules are exposed. The problem of identifying desirable probe molecules is not trivial. The potential for cross-hybridization of a given probe molecule with hundreds, thousands, or tens of thousands of potential non-target DNA polymers presents an extremely computationally intensive task. To date, because of the combinatorial explosion involved in identifying cross-hybridization potentials, computational methodologies for evaluating cross-hybridization potentials of probe molecules have focused primarily on cross-hybridization between an entire probe molecule and complementary and closely mismatched regions of target molecules. In this approach, a certain degree of consideration is implicitly given to subsequence hybridization, illustrated in FIG. 5A. However, because of failing to take into account fragmented cross-hybridization, illustrated in FIGS. 5B-C, current and prior art computational techniques may select probe molecule candidates with undesirable cross-hybridization potentials for unwanted non-target sequences. Therefore, researchers, developers, and manufacturers of hybridization-based methodologies and techniques have recognized the need for computational techniques for evaluating cross-hybridization potentials of probe molecules that take into account fragment cross-hybridization, as illustrated in FIGS. 5B and 5C, as well as hybrid and multi-fragment hybridization.
The present invention provides a computational method for determining the hybridization potential of a probe molecule with a target molecule. The hybridization potential includes the potential of the probe molecule to completely hybridize with a complementary subsequence within the target molecule, as well as the potential for single-fragment and multi-fragment subsequence cross-hybridization. First, a probe/target interaction matrix is prepared to contain indications of all possible probe/target subunit interaction stabilities. Next, the probe/target interaction matrix is analyzed to create a list of all possible single-fragment hybridizations as well as hybridization of the entire probe sequence with one or more complementary subsequences of the target molecule, when possible. Next, a graph is generated with vertices representing individual fragments found in the previous step, and edges representing possible loops in one or both of the probe and target sequences that allow the pair of fragments interconnected by the edge to coexist within a multi-fragment cross-hybridization. Finally, the graph is analyzed to construct a list of all possible single-fragment and multi-fragment cross-hybridizations possible between the probe molecule and the target molecule. A score representing the overall stability of hybridization, based on the total length of complementary fragments or based on a thermodynamic calculation, is calculated for each single-fragment and multi-fragment cross-hybridization in the list of cross-hybridizations, and the list may be sorted based on the calculated scores to produce a sorted sublist representing the most stable potential single-fragment and multi-fragment cross-hybridizations that may occur between the probe molecule and the target molecule. The method may be iteratively applied to a large number of probe/sample molecule pairs in order to select probe molecules that have high hybridization potentials for desired target molecules that have low hybridization potentials for any other sample molecules to which the probe may be exposed during an assay, when used as a primer, or initiator sequence, in PCR, or when used as an anti-sense compound.