1. Field of Invention
The present invention relates generally to techniques for manipulating hybridizations of gene expression microarrays.
2. Background of the Invention
Hybridization is a powerful and versatile technique for sequencing, detecting and localizing nucleic acids. In the general area of molecular biology, hybridization is used to map genes, detect gene expression and over-expression, diagnose diseases, identify pre-disposition to diseases, and the like.
In general, labeled nucleic acid probes are hybridized to target samples and hybridization then detected. The target samples can be in solution or they can be immobilized on a solid surface, such as in arrays and microarrays. More specifically, a gene expression microarray generally comprises a number of gene sequences distributed in an array on a substrate. Each array element is a DNA sequence, and allows the measurement of the expression of a gene in one or more samples. A typical method of using microarrays involves contacting nucleotide sequences contained in a fluid with the sequences immobilized on the microarray under hybridization conditions, and then detecting the hybridization complex. The resulting hybridrized microarray is commonly referred as a hybridization, or simply a xe2x80x98hyb.xe2x80x99 The resultant pattern of hybridized nucleic acids provides information regarding the genetic profile of the test array.
A widely used method for detecting the hybridization complex in microarrays is by fluorescence. In one method, probes derived from a biological sample are amplified in the presence of nucleotides that have been coupled to a fluorescent label (reporter) molecule so as to create labeled probes. The labeled probes are then incubated with the microarray so that the probe sequences hybridize to the complementary sequences immobilized on the microarray. A laser scanner is then used to determine the levels and patterns of fluorescence.
The use of fluorescence detection in microarray analysis is disclosed in U.S. Pat. No. 5,888,742 to Lal et al. for the detection of altered expression of human phospholipid binding protein (PLBP) and in U.S. Pat. No. 5,891,674 to Hillman et al. for the monitoring of the expression level of insulin receptor tyrosine kinase substrate (IRS-p53h), and to identify its genetic variants, mutations and polymorphisms for determining gene function, and in developing and monitoring the activity of therapeutic agents.
The above described hybridization detection method is known as single channel hybridization. This approach provides generally a single measure of the hybridization for each sequences, but does not provide any differential information about relative amounts of hybridization between different samples. To obtain relative hybridization rates, a more complex process known as competitive hybridization is used. In this process, two samples of nucleotides from a particular tissue or other specimen are bound to fluorescent label, each lable having distinctive emission/absorption spectra. Typically one sample has a fluorescent dye of one color (e.g. green), the other sample having a different color dye (e.g., red). Typically one of the samples is a control sample, and the other the experimental sample. The labeled samples are contacted with the microarray under hybridization conditions so the labeled sequences bind with various ones of the sequences on the array. A laser scanner is then used to measure the degree to which the two differently labeled samples have hybridized the microarray. More particularly, a measure of the transcript abundance values for each of the red and green samples is obtained for each array element. The ratio of the red and green transcript abundance values is call the fold difference, and it provides a measure of the relative abundance of the MRNA in the two hybs, with respect to each array element (gene sequence). This can inform the researcher, for example, of the change in MRNA abundance in the experimental sample relative to the control.
The number of gene sequences (array elements) that can be analyzed in this way is limited by the size of the substrate and manufacturability limitations, but is typically less than all of the gene sequences of interest to a researcher. For example, one commercially available type of microarray from Incyte Pharmaceuticals, Inc. contains 10,000 gene sequences. However, over 100,000 gene sequences have been identified. Currently, a researcher wanting to analyze a particular sample against the entire database of gene sequences must perform at least 6 different hybridizations, one on each microarray of 10,000 sequences. Each of the resulting hybs must be separately analyzed and searched during subsequent research. The handling of multiple separate hybs is cumbersome and inefficient. Thus, it is desirable to provide a way for the researcher to combine hybs from different microarrays in a manner that allows them to be queried and otherwise processed as a single hyb.
In performing genetic analysis, it also desirable to obtain a sense of the variability of the hybs derived from the same sample. More specifically in some instances it is desirable to be able to average the relative transcript abundance values from two or more hybs. However, because the relative transcript abundance values that describe the hybs are ratios, conventional arithmetic averaging gives incorrect averaged values. Accordingly, it is desirable to provide a way to correctly average the relative abundance values from multiple hybs.
The present invention overcomes the limitations of conventional hyb manipulation tools and techniques by providing for the creation and manipulation of composite hybs and averaged hybs. A composite hyb is formed from a user selected number of different hybs that have a same technology type, and a same technology specific data source. The composite hyb can be treated as a single large hyb over the entirety of the multiple arrays. The researcher can interact with a composite in the same manner as regular hybs, including searching, visualization, or other types of data processing. A given individual hyb may be made a part of any number of different composite hybs. Beneficially, the underlying data from the selected hybs is preserved and always available to the researcher. In one embodiment, to avoid explosive proliferation of the hyb data, particularly where a hyb is a member of many composite hybs, the hyb data is not replicated in each composite hyb. Istead, each composite hyb utilizes the original data of its underlying hybs. Alternatively, where data storage limitations are not as significant, duplication of the underlying data may be implemented. A composite hyb may be created from other composite hybs or from averaged hybs.
The present invention also provides for the construction of averaged hybs. A number of hybs of a given sample are selected by the user, and a correct determination of the average relative transcript abundance value for each array element is computed and stored. The researcher can then treat the averaged hyb in the same manner as an individual hyb, and obtain the additional benefit of the robustness of the averaged values. A further beneficial feature is the ability to form a composite hyb from multiple averaged hybs.
The features and advantages described in this summary and the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.