Arrays of binding agents or probes, such as polypeptide and nucleic acids, have become an increasingly important tool in the biotechnology industry and related fields. These binding agent arrays, in which a plurality of probes are positioned on a solid support surface in the form of an array or pattern, find use in a variety of different fields, e.g., genomics (in sequencing by hybridization, SNP detection, differential gene expression analysis, identification of novel genes, gene mapping, finger printing, etc.) and proteomics.
In using such arrays, the surface bound probes are contacted with molecules or analytes of interest, i.e., targets, in a sample. Targets in the sample bind to the complementary probes on the substrate to form a binding complex. The pattern of binding of the targets to the probe features or spots on the substrate produces a pattern on the surface of the substrate and provides desired information about the sample. In most instances, the targets are labeled with a detectable label or reporter such as a fluorescent label, chemiluminescent label or radioactive label. The resultant binding interaction or complexes of binding pairs are then detected and read or interrogated, for example by optical means, although other methods may also be used depending on the detectable label employed. For example, laser light may be used to excite fluorescent labels bound to a target, generating a signal only in those spots on the substrate that have a target, and thus a fluorescent label, bound to a probe molecule. This pattern may then be digitally scanned for computer analysis.
Generally, in discovering or designing probes to be used in an array, a nucleic acid sequence is selected based on the particular gene of interest, where the nucleic acid sequence may be as great as about 60 or more nucleotides in length or as small as about 25 nucleotides in length or less. From the nucleic acid sequence, probes are synthesized according to various nucleic acid sequence regions, i.e., subsequences, of the nucleic acid sequence and are associated with a substrate to produce a nucleic acid array. As described above, a detectably labeled sample is contacted with the array, where targets in the sample bind to complimentary probe sequences of the array.
As is apparent, a key step in designing arrays is the selection of a specific probe or mixture of probes that may be used in the array and which maximize the chances of binding with target in a sample, while at the same time minimize the time and expense involved in probe discovery and design. In practice, designing an optimized array typically involves iterating the array design one or more times to replace probes that are found to be undesirable for detecting targets of interest, either due to poor signal quality and/or cross-hybridization with sequences other than the targets of interest. Such iterations are costly and time consuming.
For example, conventional probe design may be performed experimentally or computationally, where in many instances it is performed computationally. Accordingly, probe design usually involves taking subsequences of a nucleic acid and filtering them based on certain computationally determined values such as melting temperature, self structure, homology, etc., to attempt to predict which subsequences will generate probes that will provide good signal and/or will not cross-hybridize. The subsequences that remain after the filtering process are selected to generate probes to be used in nucleic acid arrays.
While attempts have been made to predict which probes will provide the best results in an array assay, such attempts are not completely satisfactory as probes selected using these methods are often still found to be undesirable for one or both of the above-described reasons. In other words, some probes will still fail or give false results as the computational techniques used to filter and select the probes are not precise predictors. Accordingly, as mentioned above, typically an array deign must be iterated a number of times in order to filter out all the undesirable probes from the array. Furthermore, such attempts often characterize probes after they have been synthesized, that is after time and expense have been already been invested.
As such, there is continued interest in the development of new methods and devices for producing arrays of nucleic acid probes that provide strong signal and do not cross hybridize with sequences other than targets of interest. Of particular interest is methods of probe selection that are easy to use, cost effective, which identify undesirable regions of the nucleic acid sequence before they are used to generate probes and which mark the undesirable regions of the nucleic acid sequence so that the undesirable regions are permanently prevented from being used to generate nucleic acid probes.