Mixed distribution is a model representing data distribution by a plurality of distributions, which is an important model for industrial data modeling. Such model includes various kinds of models such as a mixed normal distribution and a mixed hidden Markov model.
In general, when the number of mixtures and a kind of each component are specified, it is possible to specify a parameter of a distribution by using a known technique such as an EM algorithm (e.g. Non-Patent Literature 1).
For estimating a parameter, it is necessary to determine the number of mixtures and a kind of each component and such a problem of specifying a form of a model is in general referred to as a “model selection problem” or “system Identification problem”, a crucial problem for setting up a reliable model, for which a plurality of techniques are proposed as related art.
As a leading technique for model selection, proposed are model selection methods using an information criterion such as MDL (Minimum Description Length) (e.g. Non-Patent Literature 2) or AIC (Akaike's Information Criterion) (e.g. Non-Patent Literature 3).
Model selection method using an information criterion is a method of selecting a model which optimizes a value of an information criterion for data from among model candidates. Models which optimize a value of an information criterion are known to have superior statistical properties, for example, coincidence with a true distribution in a case of MDL and minimum prediction error in a case of AIC.
With a model selection method using an information criterion, calculating a value of an information criterion for each of all model candidates theoretically enables model selection for an arbitrary model candidate, while when the number of model candidates is large, calculation is practically impossible.
As an example, description will be made of the problem of selecting a mixed polynomial curve in the following. Polynomial curve has a plurality of degrees including a straight line (linear curve), a quadric curve and a cubic curve.
When selecting an optimum model by searching the number of mixtures from 1 to Cmax and the degree of a curve from 1 to Dmax, the related art requires calculation of an information criterion for each of all model candidates such as a straight line and two for a quadric curve (the number of mixtures is 3), three for a cubic curve and two for a quartic curve (the number of mixtures is 5). The number of model candidates, in a case where Cmax is 10 and Dmax is 10, for example, will be about a hundred thousand and in a case where Cmax is 20 and Dmax is 20, will be tens of billions, which will be exponentially increased as a model to be searched becomes complicated.
For this problem, Patent Literature 1 discloses a technique of executing high-speed model selection based on an information criterion by repeatedly optimizing an expected information criterion for complete data including a hidden variable with respect to various mixed distribution models.    Patent Literature 1: Japanese Patent Application No. 2009-013503.    Non-Patent Literature 1: Christopher M. Bishop, Pattern Recognition and Machine Learning, New Edition, Springer-Verlag, Aug. 17, 2006, pp. 438-441.    Non-Patent Literature 2: Kenji Yamanishi, Te Sun Han, “Introduction to MDL from Viewpoints of Information Theory”, Japanese Society for Artificial Intelligence, May 1992, vol. 7, No. 3, pp. 427-434.    Non-Patent Literature 3: Hidetoshi Shimodaira et al., “Model Selection, Frontier of Statistical Science of Cross-Points of Prediction, Test and Presumption (3)”, Iwanami Shoten, Publishers., December 2004, pp. 24-25.    Non-Patent Literature 4: Yue Wang, Lan Lou, Matthew T. Freedman, and Sun-Yuan Kung, “Probabilistic Principal Component Subspaces: A Hierarchical Finite Mixture Model for Data Visualization”, IEEE TRANSACTIONS ON NEURAL NETWORKS, May 2000, Vol. 11, No. 2, pp. 625-636.
Since such a method of repeatedly optimizing an expected information criterion for complete data including a hidden variable as recited in the Patent Literature 1 is premised on that a parameter of each component of a mixed distribution is independent, it has a problem of inapplicability to a model failing to satisfy the premise.
Another problem is that high-speed model selection cannot be realized in such a condition where the number of component candidates is exponentially increased as a case where independency of an attribute should be selected in each component, for example. Condition where the number of component candidates is exponentially increased is, for example, a mixed distribution of normal distributions of D dimensions having different independencies. In this case, the number of component candidates will be drastically increased along with a dimension for a dimension independency candidate to exist as many as Σ_{d=0}^{D*(D−1)/2}D*(D-1)/2Cd.