1. Technical Field
The present invention relates generally to discovering gene regulatory models, and more specifically relates to a system and method for employing relational fuzzy modeling to evolve gene regulatory models based on gene expression data.
2. Related Art
Currently, tremendous efforts are being put forth in the fields of genomics, and more particularly, systems biology. Two important steps are involved in such an analysis. The first step involves gene expression analysis, which tries to determine what genes are active in the production of proteins. The second step involves gene regulatory models, which tries to determine the interdependence of active genes in the production of proteins. A significant challenge exists in developing algorithms to interpret the interdependence of different genes under different conditions.
Gene regulation can be useful for both assaying drugs and as a source for new molecular targets, assuming that regulatory models are well understood. Changes in gene expression patterns can be used to assay drug efficacy and for determining the onset of a disease. One assay that takes advantage of the existing level of sequence information and that is complementary to sequence and genetic analysis is gene expression profiling. Expression profiling technologies such as GENECHIP™ measure the expression level of thousands of genes simultaneously using an array of oligonucleotides bound to a silicon surface. These arrays are hybridized under stringent conditions with a complex sample representing mRNAs expressed in the test cell or tissue.
The results from these expression-profiling technologies are quantitative and highly parallel. These generate huge datasets that are not amenable to simple analysis. The greatest challenge in maximizing the use of this data is to use this data to develop algorithms to interpret and interconnect results for different genes under different conditions. Currently most expression data is analyzed using clustering, mining techniques or linear methods.
Examples include: 1. Cho et. al., “A Genome-wide transcriptional analysis of the mitotic cell cycle”, Mol. Cell. 2: 65-73, 1998; 2. Tavazoie et. al, “Systematic determination of genetic network architecture,” Nat Genet, 22:281-285, 1999; 3. Reconstructing Gene Networks from Large Scale Gene Expression Data. D'haeseleer, P., Ph.D. dissertation, University of New Mexico, 2000; 4. Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering. D'haeseleer, P., Liang, S., and Somogyi, R., Bioinformatics 16(8):707-26, 2000; 5. Gene network inference using a linear, additive regulation model. D'haeseleer, P., Fuhrman, S., Submitted to Bioinformatics; 6. Linear Modeling of mRNA expression levels during CNS development and injury. D'haeseleer, P., Wen, X., Fuhrman, S., and Somogyi, R., Pacific Symposium on Biocomputing '99, pp. 41-52, World Scientific Publishing Co., 1999; 7. Gene Expression Analysis and Genetic Network Modeling. D'haeseleer, P., Liang, S., and Somogyi, R., Pacific Symposium on Biocomputing '99, Tutorial session on Gene Expression and Genetic Networks; 8. Data Requirements for Inferring Genetic Networks from Expression Data. D'haeseleer, P., Pacific Symposium on Biocomputing '99, Poster session; and 9. Mining the Gene Expression Matrix: Inferring gene relationships from large scale gene expression data. D'haeseleer, P., Wen, X., Fuhrman, S., and Somogyi, R., Information Processing in Cells and Tissues, Paton, R. C., and Holcombe, M. Eds., pp. 203-212, Plenum Publishing, 1998.
There have been other methods that make use of fuzzy logic that can enhance hard boolean logic based algorithms to do better clustering of gene expression data. Examples include: Tomida et al., “Gene Expression Analysis Using Fuzzy ART,” Gen Infor, 12:245-246, 2001; Eisen et al., “Exploring the conditional co-regulation of yeast gene expression using fuzzy K-means clustering”, Genome Biology, Vol 3:11, 1-6, 2002; and Delalin et al., “A fuzzy Algorithm for Gene Expression Analysis” which can be found on the world wide web at lri.fr/˜sebag/gafo/puces.pdf.
Unfortunately, significant drawbacks exist in linear, heuristic, regular or k-means clustering. Clustering, although powerful, can group data for only genes that express in a similar fashion. It identifies patterns only in genes that express in similar or different ways. The identification of genetic networks is however not so apparent from clustering. This drawback is dealt with in the paper, Woolf et. al., “A fuzzy logic approach to analyzing gene expression data,” Physiol Genomics, 3:9-15, (2000), which uses fuzzy logic beyond just clustering to evolve gene regulatory models from gene expression data. However, in this case the models are built on heuristics and the rule based fuzzy logic is not really easily scalable.
Other techniques like neural networks can also address these issues and the use of neural networks for exploring gene expression data to evolve genetic networks has also been done, see, e.g., Keedwell E. et al., “Modeling gene regulatory data using artificial neural networks,” Proc. of IJCNN '02, Honolulu, Hi., 183-188. However, the interpretation of the neural networks is not so easy for a scientist and hence this method is not suitable.
Accordingly, a need exists for an improved system and method of modeling gene regulatory data.