The present invention relates to computer systems and more particularly to computer systems for mining information about gene expression levels.
Devices and computer systems have been developed for collecting information about gene expression or expressed sequence tags (EST) in large numbers of samples. For example, PCT application WO92/10588, incorporated herein by reference for all purposes, describes techniques for sequence checking nucleic acids and other materials. Probes for performing these operations may be formed in arrays according to the pioneering techniques disclosed in U.S. Pat. No. 5,143,854 and U.S. Pat. No. 5,571,639, for example. Both of these U.S. Patents are incorporated herein by reference for all purposes.
According to one aspect of the techniques described in these patents, an array of nucleic acid probes is fabricated at known locations on a chip or substrate. A fluorescent label attached to a nucleic acid is then brought into contact with the chip and a scanner generates an image file indicating the locations where the labeled nucleic acids bound to the chip. Based upon the identities of the probes at these locations, information such as the monomer sequence of DNA or RNA can be extracted.
Computer-aided techniques for gene expression monitoring using such arrays of probes have been developed as disclosed in EP Pub. No. 0848067 and PCT publication No. WO 97/10365, the contents of which are herein incorporated by reference. Many diseases are characterized by differences in the degree that various genes are expressed either through changes in the copy number of the genetic DNA or through changes in levels of transcription (e.g., through control of initiation, provision of RNA precursors, RNA processing, etc.) of particular genes. For example, losses and gains of genetic material play an important role in malignant transformation and progression. Furthermore, changes in the expression (transcription) levels of particular genes (e.g., oncogenes or tumor suppressors), serve as signposts for the presence and progression of various cancers.
Information on expression of genes or expressed sequence tags may be collected on a large scale in many ways, including the probe array techniques described above. One of the objectives in collecting this information is the identification of genes or ESTs whose expression is of particular importance. Researchers use such techniques to answer questions such as: 1) Which genes are expressed in cells of a malignant tumor but not expressed in either healthy tissue or tissue treated according to a particular regime? 2) Which genes or ESTs are expressed in particular organs but not in others? 3) Which genes or ESTs are expressed in particular species but not in others?
Collecting vast amounts of expression data from large numbers of samples including many tissue types is useful in answering these questions. However, in order to derive full benefit from the investment made in collecting and storing expression data, techniques enabling one to efficiently mine the data to find items of particular relevance are highly desirable.