Molecular methods using DNA probes, nucleic acid hybridizations and in vitro amplification techniques are promising methods offering advantages to conventional methods used for patient diagnoses, biomedical research or basic biology research. Recent advances in such methods often include the introduction of parallelism, i.e., performing many experiments with the same effort previously used to perform a single experiment. However, the introduction of parallelism often forces changes in the methods used to design such experiments.
Nucleic acid hybridization has been employed for investigating the identity and establishing the presence of nucleic acids. Hybridization is based on complementary base pairing. When complementary single stranded nucleic acids are incubated together, the complementary base sequences pair to form double stranded hybrid molecules. The ability of single stranded deoxyribonucleic acid (ssDNA) or ribonucleic acid (RNA) to form a hydrogen bonded structure with a complementary nucleic acid sequence has been employed as an analytical tool in molecular biology research. The availability of radioactively, chemically and fluorescently labeled nucleoside triphosphates of high specific activity have made it possible to identify, isolate, and characterize various nucleic acid sequences of biological interest. Nucleic acid hybridization has great potential in diagnosing or characterizing diseased or altered tissue function associated with unique nucleic acid sequences or gene expression states. Unique nucleic acid sequences may result from genetic or environmental change in DNA by insertions, deletions, point mutations, or by acquiring foreign DNA or RNA by means of infection by bacteria, molds, fungi, and viruses. Altered gene expression states may arise from neoplastic transformation, viral infection, environmental insult or drug treatment. It is desirable to perform such experiments in parallel; earlier methods for introducing modest parallelism include Southern blots, Northern blots and slot blots.
Such blot techniques are examples of methods for detecting nucleic acids that employ nucleic acid probes that have sequences complementary to sequences in the target nucleic acid. A nucleic acid probe may be, or may be capable of being, labeled with a reporter group or may be, or may be capable of becoming, bound to a support. Detection of signal depends upon the nature of the label or reporter group. Usually, the probe is comprised of natural nucleotides such as ribonucleotides and deoxyribonucleotides and their derivatives although unnatural nucleotide mimetics such as peptide nucleic acids and oligomeric nucleoside phosphonates are also used. Commonly, binding of the probes to the target is detected by means of a label incorporated into the probe. Alternatively, the probe may be unlabeled and the target nucleic acid labeled. Binding can be detected by separating the bound probe or target from the free probe or target and detecting the label. In one approach, a sandwich is formed comprised of one probe, which may be labeled, the target and a probe that is or can become bound to a surface. Alternatively, binding can be detected by a change in the signal-producing properties of the label upon binding, such as a change in the emission efficiency of a fluorescent or chemiluminescent label. This permits detection to be carried out without a separation step. Finally, binding can be detected by labeling the target, allowing the target to hybridize to a surface-bound probe, washing away the unbound target and detecting the labeled target that remains.
Direct detection of labeled target hybridized to surface-bound probes is particularly advantageous if the surface contains a mosaic of different probes that are individually localized to discrete, known areas of the surface. Such ordered arrays containing a large number of oligonucleotide probes have been developed as tools for high throughput analyses of genotype and gene expression. Oligonucleotides synthesized on a solid support recognize uniquely complementary nucleic acids by hybridization, and arrays can be designed to define specific target sequences, analyze gene expression patterns or identify specific allelic variations. One difficulty in the design of oligonucleotide arrays is that oligonucleotides targeted to different regions of the same gene can show large differences in hybridization efficiency, presumably due to the interplay between the secondary structures of the oligonucleotides and their targets and the stability of the final probe/target hybridization product.
Recently, a method or algorithm was described for predicting oligonucleotides specific for a target nucleic acid where the oligonucleotides exhibit a high potential for hybridization (Shannon, et al., Method for evaluating oligonucleotide probe sequences, U.S. Pat. No. 6,251,588 (2001)). The algorithm uses parameters of the oligonucleotide and the oligonucleotide:target nucleotide sequence duplex, which can be readily predicted from the primary sequences of the target polynucleotide and candidate oligonucleotides. In the method, oligonucleotides are filtered based on one or more of these parameters, then further filtered based on the sizes of clusters of oligonucleotides. The basic steps involved in the disclosed method involve parsing a sequence that is complementary to a target nucleotide sequence into a set of overlapping oligonucleotide sequences, calculating one or more parameters for each of the oligonucleotide sequences with respect to its hybridization to the target nucleotide sequence, filtering the oligonucleotide sequences based on the values for each parameter, filtering the oligonucleotide sequences based on the length of contiguous sequence elements and ranking the contiguous sequence elements based on their length. Certain oligonucleotides within the longest contiguous sequence elements generally showed the highest hybridization efficiencies.
In many assays there may be one or more target or non-target nucleic acids present that have nucleotide sequences that are closely related to one another differing by only a few, e.g., one to five nucleotides, at one or more sites within the nucleotide sequence. One such instance of related sequences is a family of genes that are phylogenetically related and that share stretches of conserved and/or hypervariable domains.
Recently, methods, reagents and kits were disclosed for selecting target-specific oligonucleotide probes, which may be used in analyzing a target nucleic acid sequence (see, for example, U.S. Pat. No. 6,461,816 B1 and Agilent Technologies Inc. (Palo Alto, Calif.) brochure dated Nov. 1, 2001, entitled “Development of an in situ synthesized oligonucleotide microarray for gene expression monitoring of the budding yeast Saccharomyces cerevisiae,” by Stephanie Fulmer-Smentek, et al.). In the method a cross-hybridization oligonucleotide probe is identified based on a candidate target-specific oligonucleotide probe for the target nucleic acid sequence. The cross-hybridization oligonucleotide probe measures the extent of occurrence of a cross-hybridization event having a predetermined probability. Cross-hybridization results are determined employing the cross-hybridization oligonucleotide probe and the target-specific oligonucleotide probe. The target-specific oligonucleotide probe is selected or rejected for the set based on the cross-hybridization results. The process for identifying and selecting the minimum number of cross-hybridization oligonucleotide probes may be carried out using different approaches such as mismatch probe design by homology, mismatch probes that incorporate base combinations, mismatch probes that delete bases, mismatch probes that insert bases, and combinations thereof.
There remains, however, a need to prepare arrays that efficiently and effectively detect and estimate subgroups of gene families by the relative abundance of nucleic acid sequences among pools of phylogenetically related sequences that share stretches of conserved and/or hypervariable domains. Ideally, the methods should be able to employ current manufacturing techniques for the preparation of arrays with some or no modifications except to the extent of carrying out the present methods.