With the advent of DNA microarray technology, researchers can now measure the expression levels of all genes of an organism in a single assay. Measurements have since been carried out to observe the state of cells undergoing developmental program or subjected to experimental/environmental stimuli. Analysis of microarray data by clustering methods has become popular. In the analysis, patterns of gene expressions across time points or different treatments are grouped into clusters. The function of an unknown gene can then be inferred from that of the known genes in the same cluster.
Although cells of the same species carry the same genetic blueprint in the DNA, not all genes express particular features at any given time and what are expressed are always at different levels. Gene products, mainly proteins, run or catalyze the biochemical reactions in living organisms. When and to what extent a gene is regulated by other genes are keys to understand the life. A step beyond clustering is therefore to reconstruct gene regulation networks. Several methods for gene network modeling and reconstruction were thus published.
Simon et al. demonstrated, using the genome-wide transcription factor binding site analysis, the serial regulation of transcription factors during a yeast cell cycle. See Simon et al., “Serial regulation of transcription regulators in the yeast cell cycle,” Cell, 106 (2001) 697-708. By serial regulation, it was meant that the transcription activators that function during one stage of the cell cycle regulate the transcription activators/cyclin genes that function during the next stage. In FIG. 4B of their paper is shown such a serial regulatory network. Qian et al. disclosed a technology to use an SVM machine learning approach to reconstructing the cell-cycle gene network. See: Qian et al., “Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data,” Bioinformatics, Vol. 19 no. 15 2003, pages 1917-1926. Jansen et al. disclosed a technology to reconstruct a cell-cycle gene network using a database. See: Jasen et al., “A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data,” Science, Vol. 302, 17 Oct. 2003, p. 449-453. These inventions rely heavily on databases although they also resorted to Bayesian networks. Both require extensive learning before application. In general, in the computational science, good learning depends on informative databases in which answers are buried. New features that are not included in the database cannot be discovered in the machine learning technology. In the case of gene or protein networks, databases with large number of data do not exist and quality of the regulatory relations in the databases is poor. These facts made it difficult to use the machine learning approach in the reconstruction of gene networks. Furthermore, in analyzing the data, a plurality of ad hoc parameters such as thresholds, likelihood cuts etc. are necessary. Interpretation of the data depends deeply on the values of the parameters as selected.
It is thus necessary to provide an innovative method for reconstruction of gene networks from existing noisy data.
It is also necessary to provide a method for reconstruction of gene networks, the levels of detail of the reconstructed gene networks being inversely proportional to the noisiness of the data.
It is also necessary to provide a method for reconstruction of gene networks using a computation-oriented approach.
It is also necessary to provide a method for the reconstruction of gene networks with low false positive rate.
It is also necessary to provide a method for reconstruction of gene networks wherein no ad hoc parameters are necessary.
It is also necessary to provide a method for reconstruction of gene networks from time-series microarray data.
It is also necessary to provide a device for reconstruction of gene networks using all the above methods.