The present disclosure relates to a method and system for accurately predicting data such as the electric power consumed in a manufacturing process, a system using this method, and a computer-readable program for realizing this method.
Automotive and electrical parts are manufactured by melting and molding many different materials such as metals and plastics and then assembling the resulting components. In the case of the iron used in these parts, the ore is ground and melted, carbon and other metals are added, and the combination is molded into a predetermined shape. The electric power consumed in this manufacturing process can be estimated, for example, from the physical properties and utilized amounts of raw materials such as iron, copper, polyethylene and polypropylene. One such technique is known as regression analysis.
Regression analysis is a statistical method to predict an output variable based on a multivariate function using a plurality of explanatory variables, and such a multivariate function is called regression equation. Regression analysis can also determine a distribution of the output variable, based on the estimated variance of the residuals. By collecting datasets that relate actual power consumption with the physical properties and the amounts of raw materials, one can numerically determine the regression equation to predict the electronic power consumed in the manufacturing process. The electric power to be predicted is actually given with substituting the values representing the physical properties and amounts of raw materials into the fitted regression equation.
Conditional density estimation is known to be another technique that can be applied for predicting power consumption. Also in conditional density estimation, a relationship equation between an output variable and explanatory variables is derived as in regression analysis. Instead of the average of the output variable, a conditional probability density function representing the probability distribution of the output variable is derived. In regression analysis, a parametric assumption such as Gaussianity is placed in an error distribution. In conditional density estimation, a complex distribution is handled in most cases in which the error distribution is not limited to a Gaussian distribution using a parametric method.
For example, a technique has been disclosed that is able to accurately predict the probability of a predicted state while incorporating interaction between the explanatory variables and non-linearity of the regression equation (see JP 2007-122418). In this technique, training data is first read from a database. This training data is a sample set including a plurality of explanatory variables and output variables representing the presence or absence of a certain state occurring. A regression coefficient in a regression equation modeled with a kernel function serving as the sum of element kernel functions prepared for each explanatory variable is determined with optimizing a target function given in advance using the training data.
Afterwards, in this technique, a plurality of explanatory variables serving as inputted parameters are plugged into the regression equation to obtain an output variable. The obtained output variable is plugged into a probability prediction function to predict the probability of a certain state occurring or not occurring. Here, the kernel function has a relationship represented by the inner product of a feature vector, such as a kernel function between ith data and jth data of k(x(i), x(j))=<φ(x(i), φ(x(j))> where φ(x(i)) is a vector expression in the d-dimensional feature space of data x(i).
Based on the kernel trick, another regression method has also been disclosed where relationship between qualitatively-different types of a plurality of datasets and a response value is modeled, and multiple kernel learning algorithm is applied for optimizing each weight in a plurality of kernel functions with computer processing of a training data (see JP 2011-198191). In this method, for each type of the datasets, a similarity matrix and its corresponding graph Laplacian matrix are computed. Then, the variance of a coupling constant and an observation model is calculated in a variational Bayesian method using the graph Laplacian matrices, assuming all of the graph Laplacian matrices are provided as linear combinations with the coupling constant, the observation model with observed data is a normal distribution, the latent variable used to explain the observed data is also a normal distribution, and the coupling constant is in accordance with a gamma prior distribution. Afterwards, a prediction distribution for given inputted data is determined from a Laplace approximation using the variance of the coupling constant and the observation model.
Thus, it is possible to execute a multiple kernel learning process with a reasonable amount of computational costs by assuming a probabilistic model with observed variables, latent variables associated with them, and a coupling constant, and by optimizing this on the basis of a variational Bayesian method.