Among the methods for classifying data of a plurality of properties, there is a kind of method which uses a Kernel function that defines an inner product (similarity) between vectors while considering the data as vectors on a given space.
As the Kernel function, Gaussian Kernel, polynominal Kernel, and the like are known in general. However, those existing Kernel functions have their good points and bad points, so that they may not be proper for data classification as a target.
Therefore, Patent Document 1 generates a Kernel function which is suited for classifying prescribed data by linearly coupling a plurality of existing Kernel functions.
Hereinafter, the method for generating the Kernel function depicted in Patent Document 1 will be described. In order to discriminate each of the Kernel functions which are linearly coupled with each other and the Kernel function generated by linearly coupling those, the former is called an element Kernel function and the latter is called an integrated Kernel function in this Description.
In Patent Document 1, provided that i-th teacher data is zi, a Gaussian-type Kernel function shown in Expression 1 is initially generated first.K(zi,zj)≡exp(−β|zi−zj|2)  [Expression 1]
Note here that “β=1/(2σ2)”, and “σ2” is a maximum eigen value of a covariance matrix of input vectors (teacher data).
Then, it is checked to see whether or not the Kernel function “K(zi, zj)” satisfies a specific standard based on the teacher data and a prescribed evaluation standard. When judged as unsatisfactory, the integrated Kernel function is updated by adding another Gaussian-type Kernel function (in which “β” is 1.5 times the current parameter “β”) to the current Kernel function “K(zi, zj)”.K(zi,zj)≡(K(zi,zj)+exp(−β|zi−zj|2)  [Expression 2]
The above processing is repeated until an integrated Kernel function which satisfies the prescribed evaluation standard can be generated.
In the meantime, as other documents regarding the Kernel, there are Patent Document 2 which describes a method for classifying data of {0, 1} bit strings by defining a Kernel called a logic Kernel, and Patent Document 3 which describes a technique related to a way to set a separating plane when designating a Kernel. However, neither talks about optimization of the Kernel itself.
Patent Document 1: Japanese Unexamined Patent Publication 2006-085426
Patent Document 2: Japanese Unexamined Patent Publication 2003-256801
Patent Document 3: Japanese Unexamined Patent Publication 2004-341959
Normally, the use of an integrated Kernel function expressed as a linearly coupled form of element Kernel functions can achieve a higher classification performance than the use of one of such element Kernel functions. This is because reproducing Kernel Hilbert spaces corresponding to the total sum of a plurality of element Kernel functions include all the reproducing Kernel. Hilbert spaces corresponding to each of the element Kernel functions, so that the integrated Kernel function that is the sum of the element Kernel functions has a higher expressing capacity than each of the element Kernel functions. Thus, the Kernel function optimizing method proposed in Patent Document 1 is somewhat considered an effective method for generating a Kernel function suited for classifying prescribed data. However, there are following issues.
A first issue is that how many element Kernel functions are to be linearly coupled is not determined in advance until the optimum Kernel function is obtained. As the number of element Kernel functions to be linearly coupled increases, the final integrated Kernel function becomes more complicated. This results in increasing a calculation cost. Even though it is desired to limit the coupling number for suppressing the calculation cost to a certain value or lower, it is not possible to perform optimization of the Kernel function within a range of such coupling number.
A second issue is that contributions of each of the element Kernel functions configuring the optimum integrated Kernel function are unknown. For example, considering a case where a plurality of distance scales of different kinds of properties to be considered are used as a plurality of distance scales that define distances between data of a plurality of properties, and an integrated Kernel function that is obtained by linearly coupling the element Kernels corresponding to each of the distance scales, it becomes possible to discriminate the property that contributes to the classification and the property that does not contribute to the classification if the extent of the contribution of each element Kernel function becomes clear. If so, such technique can be utilized for achieving dimension compression or the like. However, it is difficult to be achieved with Patent Document 1.
An object of the present invention is to generate an integrated Kernel function that is optimum for classifying data, through linearly coupling preset number of element Kernel functions.
Another object of the present invention is to clarify the contributions of each of the element Kernel functions which configure the optimum integrated Kernel function.
Still another object of the present invention is to perform data classification, data compression, or factor estimation by using the generated integrated Kernel function.