1. Field of the Invention
Embodiments of the invention relate to probes for detecting nucleic acid sequences, and particularly to the selection of probes from among a set of candidate probes.
2. Related Technology
The basic principles of molecular biology are well known. The DNA molecule is composed of complementary strings of nucleotide bases. Sequences of bases within the DNA molecule referred to as genes represent the amino acid sequences of individual proteins. To form a protein, the DNA molecule is transcribed to create an RNA molecule having a nucleic acid sequence that is complementary to the sequence of the gene for that protein. The RNA molecule is transported to a ribosome where the protein is constructed based on the information represented by the nucleic acid sequence of the RNA.
A current goal of genetic research is to identify relationships between biological conditions and specific genes in the genome. One method for identifying these relationships is to detect the RNA molecules that are present in specific tissues and to search for correlations between the presence of a particular RNA molecule and known conditions of the tissues in which it is found. The detection of nucleic acid sequences such as genes and RNA molecules may be performed using nucleic acid probes. A probe is typically a molecule that includes a nucleic acid sequence that is complementary to a nucleic acid sequence within a target molecule of interest, such as a gene or an RNA molecule. The probe can also include a marker that produces a signal which can be detected to determine whether the probe has hybridized to another nucleic acid sequence. Alternatively, the target nucleic acid can be labeled for detection of a hybrid between probe and target.
Nucleic acid probes are widely used in nucleic acid array detection systems. In these systems, an array of discrete locations is formed on a substrate. Each discrete location is composed of a large number of identical probe molecules (e.g., 100,000 probes). An exemplary array is a bead array in which discrete locations include attached beads each bearing a unique probe type. A bead array typically includes multiple beads having the same probe, and may also include other beads containing other probes. The array is exposed to a sample (e.g., a labeled RNA derived from a tissue sample) in a hybridization chamber for a period of time to allow hybridization to occur between the probes and target nucleic acid sequences in the sample. A scanner is then used to create data representing the signal detected from probes that have hybridized to targets in the sample, and image processing is performed to determine a signal value for the probe as a whole. Typically the signal is a fluorescent signal that is detected by an optical scanner. High fluorescence indicates that the probe underwent significant hybridization to nucleic acid sequences in the tissue, suggesting a high presence of the target sequence in the tissue. Low fluorescence indicates that very little hybridization occurred, indicating very little presence of the target sequence in the tissue.
A problem of nucleic acid probes is that it is possible for a probe to hybridize to nucleic acid sequences that are not a perfect complement of the probe sequence but that have sufficient similarity to enable hybridization to occur, resulting in the generation of a signal even though the target sequence is not present. Furthermore, hybridization can be confounded by the fact that genes can contain multiple splice isoforms. If a probe is designed for a sequence that is found in multiple splice isoforms, it will not specifically detect the targeted isoform. Because the public sequence databases do not contain all splice isoforms of all genes documented, a probe designed based on the current state of information can still be non-specific due to hybridization to these undocumented variants.
Bioinformatic techniques may be used to improve probe selection by simulating the hybridization of candidate probes to nucleic acid sequences other than that of their target. For example, consider a human gene comprised of a sequence of 10,000 bases. Using informatic techniques, every unique nucleic acid sequence of a given length (e.g., 70 bases) of the gene may be defined as a candidate probe, and the hybridization potential of each candidate probe may then be simulated at every known unique location along the entire length of the human genome. The results of these simulations allow candidates having the highest theoretical selectivity for the target gene to be identified.
While informatic methods improve probe selection, experience has shown that probes selected in this manner often do not perform as expected, in that they do not hybridize to the target as efficiently as expected, or show greater than expected hybridization to non-target sequences. Thus there continues to be a need for techniques that can improve the probe selection process.