There are a number of known techniques for automating the classification of data based on an analysis of a set of training data. Of particular interest herein are kernel-based techniques such as support vector machines. The development of support vector machines has a history that dates back to 1965, when Chervonenkis and Vapnik developed an algorithm referred to as the generalized portrait method for constructing an optimal separating hyperplane. A learning machine using the generalized portrait method optimizes the margin between the training data and a decision boundary by solving a quadratic optimization problem whose solution can be obtained by maximizing the functional:
      W    ⁡          (      α      )        =                    ∑                  i          =          1                l            ⁢              α        i              -                  1        2            ⁢                        ∑                      i            ,                          j              =              1                                l                ⁢                              α            i                    ⁢                      α            j                    ⁢                      y            i                    ⁢                                    y              j                        ⁡                          (                                                x                  i                                ,                                  x                  j                                            )                                          subject to the constraint that Σi=1lyiαi=0 and αi≧0. The Lagrange multipliers αi define the separating hyperplane used by the learning machine. Supposing the optimal values for the multipliers are αio and the corresponding value for the threshold is bo, the equation for this hyperplane is Σi=1lαioyi(xi, x)+bo=0.
In 1992, Boser, Guyon, and Vapnik devised an effective means of constructing the separating hyperplane in a Hilbert space which avoids having to explicitly map the input vectors into the Hilbert space. See Bernard E. Boser, Isabelle M. Goyon, and Vladimir N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proceedings of the Fifth Annual Workshop on Computational Learning Theory (July 1992), Instead, the separating hyperplane is represented in terms of kernel functions which define an inner product in the Hilbert space. The quadratic optimization problem can then be solved by maximizing the functional:
      W    ⁡          (      α      )        =                    ∑                  i          =          1                l            ⁢              α        i              -                  1        2            ⁢                        ∑                      i            ,                          j              =              1                                l                ⁢                              α            i                    ⁢                      α            j                    ⁢                      y            i                    ⁢                      y            j                    ⁢                      K            ⁡                          (                                                x                  i                                ,                                  x                  j                                            )                                          subject to the constraints
            ∑              i        =        1            l        ⁢                  y        i            ⁢              α        i              =  0and αi≧0.In this case, the corresponding equation of the separating hyperplane is
                                                        ∑                              i                =                1                            l                        ⁢                                          α                i                o                            ⁢                              y                i                            ⁢                              K                ⁡                                  (                                                            x                      i                                        ,                    x                                    )                                                              +                      b            o                          =        0                            (        1        )            
In 1995, Cortes and Vapnik generalized the maximal margin idea for constructing the separating hyperplane in the image space when the training data is non-separable. See Corinna Cortes and Vladimir N. Vapnik, “Support Vector Networks,” Machine Learning, Vol. 20, pp. 273-97 (September 1995). The quadratic form of the optimization problem is expressed in terms of what is referred to as a “slack variable” which is non-zero for those points that lie on the wrong side of the margin, thereby allowing for an imperfect separation of the training data. By converting to the dual form, the quadratic optimization problem can again be expressed in terms of maximizing the following objective functional
      W    ⁡          (      α      )        =                    ∑                  i          =          1                l            ⁢              α        i              -                  1        2            ⁢                        ∑                      i            ,                          j              =              1                                l                ⁢                              α            i                    ⁢                      α            j                    ⁢                      y            i                    ⁢                      y            j                    ⁢                      K            ⁡                          (                                                x                  i                                ,                                  x                  j                                            )                                          subject to the constraint Σi=1lyiαi=0 but with the new constraint that0≦αi≦C.Again, the corresponding equation of the separating hyperplane is given by equation (1) above. The equation is an expansion of those vectors for which αi≠0, these vectors being referred to in the art as “support vectors.” To construct a support vector machine one can use any positive definite function K (xi, xj) creating different types of support vector machines. Support vector machines have proven to be useful for a wide range of applications, including problems in the areas of bioinformatics or text classification.