1. Field of the Invention
The present invention relates to a technology which supports the analysis of data obtained by an experiment or a research.
2. Description of the Related Art
As technology for supporting the analysis of data obtained by an experiment or a research, a technology wherein data is statistically analyzed, similar data is re-arranged and systematized based on the analytical results, and then presented is known. A technology wherein a factor analysis, a form of multivariate analysis, is performed and the relation between data is represented as a scatter diagram and a technology wherein data is clustered according to the similarities between data and represented as a tree diagram are known as typical technologies. These technologies enable users to easily analyze and interpret experimental data since the characteristics of the experimental data can be recognized as patterns.
For example, in regards to gene expression data, a technology is known wherein gene expression pattern and the clustering results of the gene are represented in a form such as that shown in FIG. 1. In FIG. 1, gene expression data 1 is a representative example of a gene expression pattern where the expression level of a gene (vertical axis 1y) in an experiment condition (horizontal axis 1x) is expressed by the color of the corresponding cell (alternative representation is indicated in FIG. 1 by the darkness of the shading). The tree diagram 2 is a representative example of the results of hierarchical clustering of genes based on the similarities in the expression patterns.
For example, Japanese Patent Laid-Open Publication No. 2001-281244 discloses technology for extracting typical classifications, where broad classifications and classification grading vary significantly, by analyzing the results of the clustering while taking into consideration the “identification error scope” of users. The technology for presenting the information expressing these typical classifications in the tree diagram is also disclosed.
In addition, the Japanese Patent Laid-Open Publication No. 2000-99746 shows technology, in regards to analysis of data with multiple attributes, for detecting attributes suitable for the categorization and visualization of the data characteristics based on the correlation coefficient between the attributes according to the distribution of the attribute values or the like, and presenting information suitable for user analysis.
However, since these technologies make the analytical results to be provided to the user by analyzing the intrinsic nature (or correlation) of the target data, the presented results are not necessarily understandable for the user. That is a problem.
In other words, since typical data analysis technologies, including factor analysis and cluster analysis, can only present possible classification of the data items according to the mutual similarities across the data, the interpretation of the analytical results is left to the user.
For example, in regards to factor analysis, the result can be easily interpreted if it presents such a good factor as most of high score genes for that factor belong to a gene family which produces a kind of enzyme relating to a certain function. However, it is more likely that obtained results are hard for the user to interpret.
Furthermore, even in a cluster analysis, although the data items can be hierarchically classified (see FIG. 1, for example), what meanings the aggregation of data belonging to each hierarchy corresponds to is subjected to the judgment of the user.
Some methods for solving this problem, such as factor rotation (varimax method) in factor analysis, which rotates factors in a direction that is easy to interpret, are known. However, the basic purpose of those methods is to transform the analytical results into as simple a structure as possible, and the knowledge of the user is not considered.
Although the technology disclosed in the afore-mentioned Japanese Patent Laid-Open Publication No. 2001-281244 enables the user to find an appropriate classification result easily by taking into consideration the “identification error scope” specified by the user and combining similar classification results which fall within the identification error scope, the knowledge of the user is not considered.
Furthermore, although the technology disclosed in the afore-mentioned Japanese Patent Laid-Open Publication No. 2000-99746 provides a mechanism for reflecting specifications by the user, such as specifications of the targeted attribute of the analysis, in the classification result, its configuration cannot flexibly reflect the background knowledge of the user because it is difficult or may be impossible for the user to list up all possible specifications relating to the background knowledge beforehand.
The purpose of the present invention is to enable users to efficiently analyze experimental and research data, taking into account the foregoing circumstances.