Shape Representation
Shape of an object represents the geometrical information that is independent of the transformational (scaling, rotation, articulation, etc) effects. Understanding shape is essential in major computer vision applications from design and inspection in industrial manufacturing to content based retrieval in visual data search engines to surgical planning and monitoring of changes in medical systems to recognition of people and their actions in video surveillance to analysis of compositions in architecture and archeology.
Recent psychophysical findings suggest that the perceptual representation of shape is primarily based on qualitative properties whose topological structures remain relatively stable over transformational (viewpoint, articulation) conditions. Other empirical studies have shown that the neural processing of shape in the brain is broadly distributed throughout the ventral (what) pathway that is involved in object recognition, and the dorsal (where) pathway that is involved in spatial localization. In other words, an adequate mathematical representation of shape needs to be invariant to viewpoint changes and articulated object motion, and discriminative enough to enable detection and classification.
Two main approaches dominate previous work on shape representation: global approaches model an object as a whole segment, while part-based approaches advocate segmentation of shape into constituent regions. The drawback of a purely global approach is the exclusion of articulation and the sensitivity to occlusion. The drawback of a purely part-based approach is that a consistent partitioning is generally not possible in the face of numerous combinations of possibilities and object shape variations. Besides, segmentation itself is ill-posed, except under controlled environments or for restricted application domains.
Global models cover a wide range of methods including statistical moments, eigenshapes, curvature scale space, elastic matching, SIFT, parametric curves (polylines), image signatures, etc. Zernike moments are a class of orthogonal moments that can be defined invariant to rotation and translation. Eigenshapes decompose a distance matrix of boundary points into an ordered set of eigenvectors and finds the modes of these eigenvectors. Elastic matching evaluates the similarity as the sum of local deformations needed to change one shape into another. Scale space representation successively smoothens to the contour while decreasing the number of curvature zero crossings. In general, global models need supplementary mechanisms to normalize for affine transformations.
In comparison, parts based approaches describe shapes in terms of their part structure. Parts are defined to be nearly convex shapes separated from the rest of the object at concavity extrema. It is possible to build a discriminative classifier from a collection of parts or features to solve correspondence. These methods often require a multitude of training samples, prior knowledge on the number of parts, and precise formulation of articulation.
Other part-based methods try to learn the part structure by considering the shape interior. For instance, shock graphs are defined as the cyclic tree of singularities of a curve evolution. The inner-distance, geodesic distance and random walk also consider the interior of the shape to build descriptors. Given a pair of points, the inner-distance is determined by finding their closest points on the shape skeleton, then measuring the distance along the skeleton. The geodesic distance is the length of the shortest path on the surface. While shock graphs benefit from the skeleton's robustness to articulation they suffer from boundary noise. The inner and geodesic distances are robust to disturbances along boundaries, yet they are highly sensitive to occlusions and fragmentations.
Pioneering work of the spin image describes the relative spatial distribution of shape points around a set of feature points. It considers a cylindrical support region and accumulates a histogram of points. The shape context is similar to the spin image except that the support region is a sphere. Because both generate sparse matrices, the distance computation becomes sensitive to the shape structure.
The above methods provide satisfactory results under ideal conditions with strong priors and clean segmentation masks. Their representation capacity substantially degrades when the shape boundary is noisy (part based methods, shock graphs), shape has internal crevices, branching offshoots and excessive discontinuities (inner-distance, spin images, shape context), and non-conforming articulations (global models). Besides, they would not necessarily extent into higher dimensions or generalize over most shape classes.
It is desirable to have a representation that competently applies to real world examples, provides robustness against noise and perturbations, and successfully models in 2D, 3D and even higher dimensions. Such a representation should also infer all types of shapes while requiring no priors about their parts, underlying geometrical transformations and possible articulations.
Support Vector Machine Classification
A support vector machine (SVM) constructs a hyperplane in a high (or infinite) dimensional feature space between a set of labeled empirical input vectors x that can be either +1 or −1 by definition for binary SVMs. The decision boundary is defined in terms of a typically small subset of training examples called as support vectors that result in a maximum margin separation between these classes. The decision function of support vector machines is given as
            f      ⁡              (        x        )              =                  ∑                  i          =          1                m            ⁢                        α          i                ⁡                  [                                    ϕ              ⁡                              (                x                )                                      ·                          ϕ              ⁡                              (                                  x                  i                  *                                )                                              ]                      ,where x*i are support vectors, αi are the corresponding weights of the support vectors, and φ is a mapping function in some dot product space H. By defining a similarity measure k in H ask(x,x*i)=φ(x)·φ(x*i)every dot product in the decision function is replaced by a kernel function. This allows the algorithm to fit the hyperplane in a transformed feature space H. The transformation may be non-linear and the transformed space high dimensional; thus though the classifier is a hyperplane in the high-dimensional feature space, it may be non-linear in the original input space. If the kernel used is a Gaussian, the corresponding feature space is a Hilbert space of infinite dimensionφ(x)·φ(x*i)=exp(−γ∥x−x*i∥2)
By using RBF, it is always possible to find a solution that correctly classifies the training data. Such a decision function has the form
            f      ⁡              (        x        )              =                  ∑                  i          =          1                m            ⁢                        α          i                ⁢                  exp          ⁡                      (                                          -                γ                            ⁢                                                                                      x                    -                                          x                      i                      *                                                                                        2                                      )                                ,and the final decision is made byl(x)=sign[ƒ(x)].
It is worth noting that the set of support vectors is usually small in compare with the entire training set.
ν-SVM
The above binary SVM algorithm learns a classifier from a set of positive and negative samples. We employ ν-SVM for training the decision function as its parameters has a natural interpretation for shapes. Learning problem is formulated as the minimization of
      τ    ⁡          (              w        ,        ξ        ,        ρ            )        =                    1        2            ⁢                                  w                          2              -    υρ    +                  1        l            ⁢                        ∑          i                                                ⁢                  ξ          i                    subject to:yi·((xi·w)+b)≧ρ−ξi,ξi≧0,ρ≧0
The above optimization tries to classify correctly as many data as possible by penalizing misclassified samples through variable ξi. At the same time, the minimization of w keeps the model as simple as possible, and the margin is made as large as possible through maximization of variable ρ. The trade-off between the model complexity and the training error is controlled by parameter νε[0, 1]. It is also the lower and upper bound on the number of examples that are support vectors and that lie on the wrong side of the hyperplane, respectively.
One-Class SVM
Other variants of binary SVM can also be used for training. For example, one-class SVM formulation allows learning from data with positive labeled samples only. This becomes extremely useful to deal with missing data due to occlusion and camera sampling error. The formulation is as follows:
                    arg        ⁢                                  ⁢        min                    w        ,        ξ        ,        ρ              ⁢          1      2        ⁢                          w                    2        +            1              υ        ⁢                                  ⁢        l              ⁢                  ∑        i                                      ⁢              ξ        i              -  ρsubject to:w·Φ(xi)≧ρ−ξi,ξi≧0,ρ≧0
Instead of separating positive from negative classes as in the binary case, this algorithm separates data from the origin in feature space. Quite similar to ν-SVM formulation, the above optimization tries to keep the model simple by reducing ∥w∥ as much as possible. Points that lie on the wrong side of hyperplane or within the margin will be penalized by parameter ξi. Instead of acting on ρ as in the previous formation, the inverse of ν acts as a weight for error regularization term ξi. However, the effect of ν does not change. This parameter allows the trade-off between accuracy and model complexity. It is also equal to the upper bound on the fraction of error and lower bound on the fraction of support vectors. One can also think of one-class SVM as the density estimator which output +1 for a small region capturing most of data, and −1 for the complement. As a consequence, it produces smearing effects around sharp corners where non-shape pixels also have high estimated densities.
As far as the parameters selection is concerned, there are two feasible strategies. The straightforward way is to partition data into two subsets, one for training and another one for testing. Cross-validation can then be used for selecting the parameters set that gives best classification result on testing data. While this method is simple and effective, it requires additional computation to classify testing data. One can improve the computational efficiency of the training process by simply looking at the upper bound of testing error. This bound is a by-product of the optimization, which can be efficiently evaluated.