Support vector machines (SVMs) are a type of machine learning method that can be used for classification, regression analysis, and ranking For example, based on a set of training data samples that are each associated with one category or another, SVMs may be used to predict which category a new data sample will be associated with. The data samples may be expressed as an ordered pair including a vector that indicates features of a particular data sample and a classifier that indicates the category of the particular data sample. In a particular example, the set of training data Ψ may be given by:Ψ={(xi,yi)|xiεn,yiε{−1,1}}i=1m  (1)where xi is a feature vector for a particular sample, yi is the classifier of the particular sample, m is the number of samples in a set of training data, and  is the set of real numbers.
Continuing with the classification example, SVMs construct a hyperplane having one or more dimensions that may separate data samples into two categories. An optimal solution given by the SVM is the hyperplane that provides the largest separation between vectors of the two categories. The vectors that limit the amount of separation between the two categories are often referred to as the “support vectors.”
In some instances, linear hyperplanes separate data samples in the two categories. In other instances, non-linear hyperplanes separate the data samples in the two categories. When non-linear hyperplanes separate the data samples, SVMs may utilize a kernel function to map the data into a different space having higher dimensions, such as the Reproducing Kernel Hilbert Space for a Mercer kernel . In this way, a linear hyperplane can be used to separate data that would otherwise be separated by a non-linear curve with complex boundaries.
The primal form of the objective function to be solved by non-linear SVMs is given by:
                              f          ⁡                      (            w            )                          =                                            σ              2                        ⁢                                                          w                                            2              2                                +                                    1              m                        ⁢                                          ∑                                  i                  =                  1                                m                            ⁢                              max                ⁢                                  {                                      0                    ,                                          1                      -                                                                        y                          i                                                ⁢                                                  〈                                                      w                            ,                                                          ϕ                              ⁡                                                              (                                                                  x                                  i                                                                )                                                                                                              〉                                                                                                      }                                                                                        (        2        )            where w is a predictor vector that is normal to the hyperplane that provides maximum separation between two classes. In addition, σ is a regularizer weight of the regularization function
      σ    2    ⁢                  w              2    2  that is used to make the objective function more regular or smooth. Further, the term
      1    m    ⁢            ∑              i        =        1            m        ⁢          max      ⁢              {                  0          ,                      1            -                                          y                i                            ⁢                              〈                                  w                  ,                                      ϕ                    ⁡                                          (                                              x                        i                                            )                                                                      〉                                                    }            may be referred herein to as the loss function for the SVM primal objective function.
Training non-linear support vector machines can be resource intensive and time consuming. Many SVM training algorithms optimize a dual form of the objective function using Lagrangian multipliers. However, in some cases, these algorithms may sacrifice accuracy for speed. In addition, attempts to reduce the amount of time to train non-linear SVMs by parallelizing computations among a number of processors to optimize the dual form of the objective function have provided marginal results.