Machine identification or classification of imaged body features is difficult with currently available techniques. Security applications and advanced computer user interface systems supporting even rudimentary imaging capability can be augmented if the system is capable of determining presence (and number) of bodies, or allows for accurate identification of particular areas of the body (including face or palm recognition). To be generally useful, such systems should be able to robustly identify target classes in the presence of other objects, under conditions of varying illumination, when subjected to various rotations, when partially occluded, or when altered by color changes.
Identifying and classifying faces in two dimensional images is particularly useful for low impact security applications. Three approaches have been widely used to identify variable form objects such as faces. The first approach uses a predefined model, and the machine system attempts to find a match using various geometric criteria. Unfortunately, such systems require substantial effort to build models, and are prone to errors in uncontrolled situations that permit rotation or occlusion of the target object. Another approach uses brightness or color level matching to identify an object. While not as susceptible to rotation or occlusion errors, a searchable model must still be constructed, and illumination errors (eg. failure under conditions of changing illumination, when backlit, etc.) can occur.
Because of such problems with model based systems, development of example based machine vision systems is an active research area. An example based system automatically finds useful identification features of a class as a result of training on a class of positive and negative labelled examples. The feature set of labelled examples can vary in size, quality of images, and types of images, and does not require potentially biased human modelling that can result in inefficient or redundant classification criteria.
Any machine learning algorithm for classification/regression highly depends on the type and quality of the feature set. A feature set should ideally reduce intra-class variance and still be highly discriminative. Generally, it is desirable to use a rather small set of features to avoid dimensionality related problems and to speed up training and classification. Due to their simplicity, it is quite common to use linear features as the input to a classifier. There is a variety of powerful analysis methods, which derive linear features from raw input data including principal component analysis, Fisher discriminant analysis, Fourier transforms, Sobel-gradients, wavelets, and haar-likes.
Support vector machines (SVMs) are a class of learning algorithms for classification/regression that are particularly useful for high dimensional input data with either large or small training sets. Support vector machines suitable for class identification problems work by mapping the input features to the SVM into a high-dimensional feature space and computing linear functions on those mapped features in the high-dimensional feature space. The optimization problem that must be solved during training of a support vector machine has a global minimum and can generally be solved with standard quadratic programming tools. In operation, a support vector machine creates a function from a set of labeled training data. The function can either be a classification function where the output is a binary decision (the input being a category) or the function can be a general regression function. For classification, support vector machines operate by finding a hypersurface in the feature space (of the SVM). This hypersurface will attempt to split the positive examples from the negative examples. The split will be chosen to have the largest distance from the hypersurface to the nearest of the positive and negative examples, generally making the classification correct for testing data that is near, but not identical to the training data.
There are two simple conventional methods to train and evaluate a support vector machine using linear input features. The first method caches all linear feature vectors zi=Axi, (i.e., it pre-computes the linear features zi;; xi denotes the raw input data of training sample i, and A is a matrix specifying all the linear features that are to be derived from the input data xi) and then uses these vectors to calculate kernel elements K(zi, zj). Evaluation of a classifier then simply transforms an input pattern x to z=Ax and uses K(z, zi) in
      class    ⁡          (      x      )        =            sign      [                        (                                    ∑                              i                =                1                            n                        ⁢                                                  ⁢                                          y                i                            ⁢                              α                i                            ⁢                              K                ⁡                                  (                                      z                    ,                                          z                      i                                                        )                                                              )                +        b            ]        =          sign      ⁡              [                              (                                          ∑                                  i                  =                  1                                n                            ⁢                                                          ⁢                                                y                  i                                ⁢                                  α                  i                                ⁢                                  K                  ⁡                                      (                                          Ax                      ,                                              z                        i                                                              )                                                                        )                    +          b                ]            where αi is the optimal solution of the maximization problem, b the associated threshold, yiε{−1,+1} the pattern label of support vector i, and n the size of the support vector set (iε{1, 2, . . . , n}).
For a large number of linear features (e.g., larger than a few thousand) it is usually not possible to store all vectors zi in memory, either for training or for evaluation. For instance, assuming a derivation of 250,000 linear features from each input data of dimension k<<250,000, a single feature vector may require more than one megabyte of memory storage making training sets with n>1,000 prohibitively expensive with present day computational resources.
Alternatively, to conserve memory, the function zi=Axi can be computed each time a kernel element K(zi, zj) is accessed. This requires only storage of the original training examples xi. Evaluating a classifier then computes z=Ax and zi=Axi for each support vector i. However, this method is computationally very expensive because training a support vector machine needs many evaluations of the kernel function. Even with a kernel cache, far more than 106 kernel evaluations to train a classifier on a training set may be required.