The present technology relates to an information processing apparatus, an information processing method, and a program. In particular, the present technology relates to an information processing apparatus, an information processing method, and a program capable of estimating a cause-effect relationship between multiple variables.
In the related art, estimation of a statistical cause-effect relationship from observed data on multivariate random variables is roughly classified into a method of maximizing, as a score, a result of estimation based on the information amount criterion, the maximum penalized likelihood method, or the Bayes method (hereinafter referred to as a first estimation method), and a method of performing estimation through a statistical test for conditional independence between variables (hereinafter referred to as a second estimation method). The cause-effect relationship between variables is usually expressed as a graphical model (acyclic model) for the sake of readability of the result.
FIG. 1 shows examples of three graphical models indicating the cause-effect relationships between a variable X and a variable Y.
In the graphical model shown in the upper part of FIG. 1, the cause-effect relationship between the variable X and the variable Y is unclear, and the variable X and the variable Y serve as vertexes connected to each other through an edge (undirected edge) having no direction. In the graphical model shown in the middle part of FIG. 1, the cause-effect relationship between the variable X and the variable Y is that the variable X corresponds to the cause and the variable Y corresponds to the effect, and the variable X and the variable Y serve as vertexes connected to each other through an edge (directed edge) indicating a direction from the cause to the effect. In the graphical model shown in the lower part of FIG. 1, the variable X and the variable Y serve as vertexes connected to each other through three variables and edges that connect the variables. In the graphical model shown in the lower part of FIG. 1, the three variables and the edges that connect the variables form a path between the variable X and the variable Y, and the path may partially include directed edges indicating directions.
However, the second estimation method may possibly estimate presence of a latent common cause variable, and an algorithm thereof is disclosed in, for example, the following documents: P. Spirtes, C. Meek, and T. Richardson, “Causal Inference in the Presence of Latent Variables and Selection Bias”, Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 499-506, 1995; P. Spirtes, T. Richardson, and C. Meek, “Heuristic Greedy Search Algorithms for Latent Variable Models”, Proceedings of International Workshop on Artificial Intelligence and Statistics, pp. 481-488, 1996; P. Spirtes, C. Glymour, and R. Scheines, “Causation, Prediction, and Search”, MIT Press, second edition, 2000; and the like. The model expressed thereby is called a mixed ancestral graph or the like (refer to P. Spirtes, T. Richardson, and C. Meek, “Heuristic Greedy Search Algorithms for Latent Variable Models”, Proceedings of International Workshop on Artificial Intelligence and Statistics, pp. 481-488, 1996).
In the second estimation method, the random variable to be normally used is set as either of categorical data (categorical variable), which is a discrete value, and numerical data (numerical variable) which is a continuous value. For example, when the random variable is a categorical variable, the cause-effect relationship is modeled as a Bayesian network model. Alternatively, when the random variable is a numerical variable, the cause-effect relationship is modeled as a structural equation model (refer to P. Spirtes, C. Glymour, and R. Scheines, “Causation, Prediction, and Search”, MIT Press, second edition, 2000).
On the other hand, in the first estimation method, a way of estimating and modeling the cause-effect relationship from the multivariate random variables, in which the categorical variable and the numerical variable are mixed, is disclosed in the following documents: N. Friedman and M. Goldszmidt, “Discretizing Continuous Attributes while Learning Bayesian Networks”, Proceedings of International Conference on Machine Learning, pp. 157-165, 1996; S. Monti and G. Cooper, “A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data”, Proceedings of Conference on Uncertainty in Artificial Intelligence, pp. 404-413, 1998; H. Steck and T. S. Jaakkola, “Predictive Discretization during Model Selection”, JMLR workshop and conference proceedings, volume 2: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, pp. 532-539, 2007; and the like. However, it is difficult to apply the modeling and estimating way to the second estimation method. Consequently, when the categorical variable and the numerical variable are mixed in multiple variables, it is difficult to build a model called a partial ancestral graph or a mixed ancestral graph according to analysis of a method of presence of a latent variable, particularly important as a practical application.
However, a technique of categorizing (discretizing) the numerical variable on the basis of a certain categorical variable and the data is disclosed in for example the following document: U. M. Fayyad and K. B. Irani, “Multi-Interval Discretization of Continuous-Valued Attributions for Classification Learning”, Proceedings of International Joint Conference on Artificial Intelligence, pp. 1022-1029, 1993.
According to this technique, in a classification learner in which the categorical variable is set as an output variable, when an attribute variable called the attribute having an effect on the output variable is a numerical variable, on the basis of the previous output data and the output variable which is the categorical variable, it is possible to discretize the corresponding attribute variable.