The present invention relates to a machine learning method, and particularly to a kernel regression method, and more particularly to a multiple kernel regression method.
In recent years, kernel regression methods have been employed to predict, for example, a cost (e.g., time) required to travel a certain route. As described in the description in Japanese Patent Application No. 2008-240351, the present inventors conceived a method to predict a time required to travel a certain route on a map through Gaussian process regression using a string kernel for describing the similarity (kernel function) between given routes. The string kernel has the length of a substring as a parameter. By studying the correlation between the string kernel and the travel time, the present inventors came to the realization that the travel time can be assumed as the integration of the contributions of substrings with various lengths and therefore that it is more desirable to create a prediction model based on multiple kernel function.
A method to estimate optimum design parameters of a vehicle is another field of application. At the design and development stage of a vehicle, it is necessary to find design parameters that satisfy a requirement for a desired strength while minimizing the cost. The work of finding such design parameters is generally done through simulations as the primary work and actual vehicle tests as the supplementary work. In this case, it is important to figure out the dependence of an objective function (e.g., the maximum impact to be applied on the human body) on the design parameters from limited experimental points. Figuring out the dependence leads to a solution as to how to modify the current parameters to further improve the current parameters. In general, the dependence is non-linear, and thus kernel regression is preferable in this case as well. However, the design parameters include various physical quantities. Hence, with a single kernel function, it is difficult to perform accurate modeling. Accordingly, it is desirable to divide the parameters into groups of the respective physical quantities and use multiple kernel functions that are appropriate to these groups.
Another example is the calculation of sales-potential scores of corporate clients. Specifically, companies as potential clients are each expressed in association with a set of numeral values of certain attributes. For example, each of such companies is expressed in an attribute vector by using managerial indices such as a profit margin, the number of employees, the levels of penetration of other competitors, indices indicating the past purchase history and the type of business, and the like. Then, the priority in sales action is calculated as a certain real value by using data on companies, on which sales actions have been taken, as training data while considering the past record of sales and the salesperson's feeling. Using the thus calculated information, one can desire to figure out how much potential an unknown company has as a client. In this case, the relation between the attribute and the potential should also appear non-linearly, and therefore use of kernel regression is desirable. However, the attributes indicating the companies include quantities different from each other in nature. Hence, in this case too, it is desirable to use multiple kernel functions appropriate for those respective quantities different in nature.
As can be understood from the above examples, the multiple kernel function learning method can be said to be one of the most common methods to handle diversity and variety of data in the real world.
Japanese Patent Application Publication No. 2007-122418 discloses a prediction method including: a read step of reading, from a database, learning data that is a set of samples each having multiple explanatory variables and a dependent variable showing whether the sample is in a certain state or not; a coefficient calculation step of calculating a regression coefficient in a regression formula by optimizing an objective function given beforehand using the learning data, the regression formula being defined by using a kernel function as the sum of component kernel functions prepared for the respective explanatory variables; a dependent-variable calculation step of calculating a dependent variable by inputting the multiple explanatory variables as input parameters into the regression formula; and a probability prediction step of predicting either the probability of the sample being in the certain state or the probability of the sample not being in the certain state, by inputting the calculated dependent variable into a probability prediction function. The kernel function used in the method is also a type of the multiple kernel function as it is the sum of the component kernel functions prepared for the explanatory variables.
It should be noted that the multiple kernel function learning method heretofore has a difficulty in calculating a problem in a practical size, due to its large computational cost.