The behavior of a large number of interacting elements in a system is difficult to display, analyze and interpret. Many techniques have been attempted to derive an ordered model from the elements in such a system. An example of a system which produces such a large amount of data is the genome. Not only may the genome consist of a large number of genes (numbering in the tens of thousands for the human) but many genes of an organism interact. For example many genes exert control over other genes. That is many genes either induce or raise the expression level of other genes or inhibit or decrease the expression level of other genes. The act of just displaying the genes of a genome and their interactions, never mind analyzing such a large amount of data, is daunting. Since the genes exert control on each other, they do not change their expression levels independently, but instead form a genome-wide network of interactions. Similarly, proteins and metabolites and other cell constituents are part of a network of interactions. The consequence of this mutual control between different genes or molecules is that the dynamics of the molecular profiles are constrained to certain coherent, recurring patterns.
Self Organizing Maps or SOMs have been used in an attempt to group genes according to their expression activity versus time profile. Under this technique, genes having similar expression behavior are grouped together into clusters on a matrix of behaviors. The output of this algorithm is essentially just the information about individual genes with regard to their assignment to one of these clusters. However, displaying these genes as a matrix of expression behaviors provides a complicated array of graphs that does not help significantly with the analysis of the interactive gene behavior or with the interpretation of coherent patterns that emerge in the displayed expression profiles.
Similarly, other currently used analysis techniques for gene profiling, such as in hierarchical clustering, k-means clustering or principal component analysis, group genes into a small number of clusters relative to the total number of genes, and also fail to visualize patterns within the overall gene profile.
In order to study the recurring patterns within genome-wide expression or molecular profiles, it is necessary to monitor the change of entire profiles at different times during a sequential process, or in response to multiple variables, such as during the longitudinal monitoring of multiple patients or of the biological responses of cells or tissues following treatment with various drugs. Such comparative time course analysis will generate data volumes comprised of three dimensions: (i) the elements of the molecular profile (e.g., the genes in gene expression profiles); (ii) the time points at which the profile is measured and (iii) the time course for each of the various processes studied. Existing gene clustering techniques are generally not capable of simultaneously monitoring multiple dimensions, and hence a new method is required to visualize these global time-dependent changes in gene expression profiles.
The present invention provides a method and apparatus to not only group genes according to activity but also to display the activity in such a way that reveals characteristic patterns in the monitored profiles, hence visualizing the underlying relationship between the genes that comprise the regulatory network. It also provides a method to display the information of all three dimensions—gene, time and process—simultaneously.