In ordinary documents, for example, documents with topics in the same field are similar in description as compared with documents with topics in different fields. Moreover, retail stores with the same locational condition are similar in sales of specific products. The locational condition is information of, for instance, whether the store is located in a business district, near a station, or in a suburb. Thus, in many types of data, an actually observed variable (the above-mentioned “description” or “sales”) changes according to the value of another factor that is unobservable (the above-mentioned “topic” or “store location”).
Estimating unobservable variables in such data is applicable to industrially important situations. As an example, by estimating documents with the same topic, a user can specify a document group that meets the search intention and a document group that does not meet the search intention from among searched-for documents, thus obtaining desired documents more quickly. As another example, by estimating stores with the same store sales factor, a user can specify in which store a product sold well in a specific store should be introduced to achieve good sales, upon deciding the range of products in each store.
Mixture distribution models are typically employed to estimate such unobservable factors. A mixture distribution model is a model representing that an observed variable is created from a distribution obtained by a superposition of a plurality of distributions (components). A mixture distribution model is a model in which components are selected according to an unobservable factor and an observed variable is created from the components.
In addition, models hierarchically using mixture distribution parameters are employed to represent situations where an observed variable creation factor is similar for each set of specific samples (e.g. a similar factor for documents, a similar factor for stores). For instance, models called “Latent Dirichlet Allocation (LDA)” described in Non Patent Literature (NPL) 1 are used in the case of natural sentences.
In NPL 1, each document is represented by words (observed variables), the words in each document have latent states, and a parameter that is set for each document defines latent variable occurrence probabilities. Further, in NPL 1, how these parameters are likely to occur is indicated using a parameter common to the whole documents. In NPL 1, the tendencies of topics depending on documents are expressed by creating such models.
In NPL 1, there is a problem that the parameters and the latent states cannot be estimated unless the number of latent states is set beforehand. To solve this problem, in NPL 2, the estimation is performed by assuming models in which the number of latent states and the parameters are created by a Dirichlet process. A nonparametric Bayesian method using a Dirichlet process, however, has a problem of extremely high computational complexity.