Human Detection
Detecting humans in images is considered among the hardest examples of object detection problems. The articulated structure and variable appearance of the human body and clothing, combined with illumination and pose variations, contribute to the complexity of the problem.
Human detection methods can be separated into two groups based on the search method. The first group is based on sequentially applying a classifier at all the possible detection sub windows or regions in an image. A polynomial support vector machine (SVM) can be trained using Haar wavelets as human descriptors, P. Papageorgiou and T. Poggio, “A trainable system for object detection,” Intl. J. of Computer Vision, 38(1); 15-33, 2000. That work is extended to multiple classifiers trained to detect human parts, and the responses inside the detection window are combined to give the final decision, A. Mohan, C. Papageorgiou, and T. Poggio, “Example-based object detection in images by components,” IEEE Trans. Pattern Anal. Machine Intell., 23(4):349-360, 2001.
In a sequence of images, a real time moving human detection method uses Haar wavelet descriptors extracted from space-time differences, P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” IEEE Conf. on Computer Vision and Pattern Recognition, New York, N.Y., volume 1, pages 734-741, 2003. Using AdaBoost, the most discriminative features are selected, and multiple classifiers are combined to form a rejection cascade, such that if any classifier rejects a hypothesis, then it is considered a negative example,
Another human detector trains an SVM classifier using a densely sampled histogram of oriented gradients, N. Dalai and B. Triggs, “Histograms of oriented gradients for human detection,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 1, pages 886-893, 2005.
In a similar approach, near real time detection performances is achieved by training a cascade model using histogram of oriented gradients features, Q, Zhu, S. Avidan, M. C. Yell, and K. T. Cheng, “Fast human detection using a cascade of histograms of oriented gradients,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition. New York, N.Y., volume 2, pages 1491-1498, 2006.
The second group of methods is based on detecting common parts, and assembling local features of the parts according to geometric constraints to form the final human model. The parts can be represented by co-occurrences of local orientation features and separate detectors can be trained for each part using AdaBoost. Human location can be determined by maximizing the joint likelihood of part occurrences combined according to the geometric relations.
A human detection method for crowded scenes is described by, B. Leibe, E. Seemann, and B. Schiele, “Pedestrian detection in crowded scenes,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 1, pages 878-885, 2005. That method combines local appearance features and their geometric relations with global cues by top-down segmentation based on per pixel likelihoods.
Covariance features are described by O. Tuzel, F. Porikli, and P. Meer, “Region covariance: A fast descriptor for detection and classification,” Proc. European Conf. on Computer Vision, volume 2, pages 589-600, 2006, and U.S. patent application Ser. No. 11/305,427, “Method for Constructing Covariance Matrices from Data Features” filed by Porikli et al. on Dec. 14, 2005, incorporated herein by reference. The features can be used for matching and texture classification problems, and was extended to object tracking. F. Porikli, O. Tuzel, and P. Meer, “Covariance tracking using model update based on Lie algebra,” In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, New York, N.Y., volume 1, pages 728-735, 2006, and U.S. patent application Ser. No. 11/352,145, “Method for Tracking Objects in Videos Using Covariance Matrices” filed by Porikli et at. on Feb. 9, 2006, incorporated herein by reference. A region was represented by the covariance matrix of image features, such as spatial location, intensity, higher order derivatives, etc. It is not adequate to use conventional machine techniques to train the classifiers because the covariance matrices do not lie on a vector space.
Symmetric positive definite matrices (nonsingular covariance matrices) can be formulated as a connected Riemannian manifold. Methods for clustering data points lying on differentiable manifolds are described by E. Begelfor and M. Werman, “Affine invariance revisited,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, New York, N.Y., volume 2, pages 2087-2094, 2006, R. Subbarao and P. Meer, “Nonlinear mean shift for clustering over analytic manifolds,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, New York, N.Y., volume 1, pages 1168-1175, 2006, and O. Tuzel, R. Subbarao, and P. Meer, “Simultaneous multiple 3D motion estimation via mode finding on Lie groups,” Proc. 10th Intl. Conf. on Computer Vision, Beijing, China, volume 1, pages 18-25, 2005, incorporated herein by reference.
Classifiers
Data classifiers have many practical applications in the sciences, research, engineering, medicine, economics, and sociology fields. Classifiers can be used for medical diagnosis, portfolio analysis, signal decoding, OCR, speech and face recognition, data raining, search engines, consumer preference selection, fingerprint identification, and the like,
Classifiers can be trained using either supervised or unsupervised learning techniques. In the later case, a model is fit to data without any a priori knowledge of the date, i.e., the input data are essentially a set of random variables with a normal distribution. The invention is concerned with supervised learning, where features are extracted from labeled training data in order to learn a function that maps observations to output.
Generally, a classifier is a mapping from discrete or continuous features X to a discrete set of labels Y. For example, in a face recognition system, features are extracted from images effaces. The classifier then labels each image as being, e.g., either male or female.
A linear classifier uses a linear function to discriminate the features. Formally, if an input to the classifier is a feature vector {right arrow over (x)}, then an estimated label y is
      y    =                  f        ⁡                  (                                    ω              →                        ·                          x              →                                )                    =              f        (                              ∑            j                    ⁢                                    ω              j                        ⁢                          x              j                                      )              ,where {right arrow over (w)} is a real vector of weights, and ƒ a function that converts the dot product of the two vectors to the desired output. Often, ƒ is a simple function that maps all values above a certain threshold to “yes” and all other values to “no”.
In such a two-class (binary) classification, the operation of the linear classifier “splits” a high-dimensional input space with a hyperplane. All points on one side of the hyperplane are classified as “yes”, while the others are classified as “no”.
The linear classifier is often used in situations where the speed of classification is an issue, because the linear classifier is often the fastest classifier, especially when the feature vector {right arrow over (x)} is sparse.
Riemannian Geometry
Riemannian geometry focuses on the space of symmetric positive definite matrices, see W. M. Boothby, “An Introduction to Differentiable Manifolds and Riemannian Geometry,” Academic Press, 2002, incorporated herein by reference. We refer to points lying on a vector space with small bold letters x ∈ , whereas points lying on the manifold with capital bold letters X ∈ .
Riemannian Manifolds
A manifold is a topological space, which is locally similar to a Euclidean space. Every point on the manifold has a neighborhood for which there exists a homeomorphism, i.e., one-to-one and continuous mapping in both directions, mapping the neighborhood to m. For differentiable manifolds, it is possible to define the derivatives of the curves on the manifold.
In Riemannian geometry, the Riemannian manifold (M, g), is a real differentiable manifold M in which each tangent space is equipped with an inner product g in a manner, which varies smoothly from point to point. This allows one to define various notions such as the length of curves, angles, areas or volumes, curvature, gradients of functions and divergence of vector fields.
A Riemannian manifold can be defined as a metric space, which is isometric to a smooth submanifold of the manifold. A metric space is a set where distances between elements of the set are defined, e.g., a three-dimensional Euclidean space. The metric space is isometric to a smooth submanifold Rn with the induced intrinsic metric, where isometry here is meant in the sense of preserving the length of curves. A Riemannian manifold is an example of an analytic manifold, which is a topological manifold with analytic transition maps.
The inner product structure of the Riemannian manifold is given in the form of a symmetric 2-tensor called the Riemannian metric. The Riemannian metric can be used to interconvert vectors and covectors, and to define a rank-4 Riemannian curvature tensor. Any differentiable manifold can be given a Riemannian structure.
Turning the Riemannian manifold into a metric space is nontrivial. Even though a Riemannian manifold is usually “curved,” there is still a notion of “straight line” on the manifold, i.e., the geodesies that locally join points along a shortest path on a curved surface.
At a fixed point, the tangent bundle of a smooth manifold M, or indeed any vector bundle over a manifold, is a vector space, and each such space can carry an inner product. If such a collection of inner products on the tangent bundle of a manifold varies smoothly as one traverses the manifold, then concepts that were defined only point-wise at each tangent space can be extended to yield analogous notions over finite regions of the manifold.
For example, a smooth curve α(t): [0, 1]→M has tangent vector α′(t0) in the tangent space TM(t0) at any point t0 ∈ (0, 1), and each such vector has length ∥α′(t0)∥, where ∥·∥ denotes the norm induced by the inner product on TM(t0). The integral of these lengths gives the length of the curve α:L(α)=∫01∥α′(t)∥dt. 
In many instances, in order to pass from a linear-algebraic concept to a differential-geometric concept, the smoothness requirement is very important. Every smooth submanifold of Rn has an induced Riemannian metric g. The inner product on each tangent space is the restriction of the inner product on the submanifold Rn. In fact, it follows from the Nash embedding theorem, which states that every Riemannian manifold can be embedded isometrically in the Euclidean space Rn, all Riemannian manifolds can be realized this way.
The derivatives at a point X on the manifold lies in a vector space TX, which is the tangent space at that point. The Riemannian manifold  is a differentiable manifold in which each tangent: space has an inner product: <, >X, which varies smoothly from point to point. The inner product induces a norm, for the tangent vectors on the tangent space, such that, ∥y∥2X=<y, y>X.
The minimum length curve connecting two points on the manifold is called a geodesic, and the distance between the points d(X, Y) is given by the length of this curve. Let y ∈ TX and X ∈ . From point X, there exists a unique geodesic starting with the tangent vector y. The exponential map, expX: TX, maps the vector y to the point reached by this geodesic, and the distance of the geodesic is given by d(X, expX(y))=∥y∥X.
In general, the exponential mapping expX is only one-to-one in a neighborhood of X. Therefore, the inverse mapping logX:  TX is uniquely defined only around the neighborhood of the point X. If for any Y ∈ , there exists several y ∈ TX, such that Y=expX(y), then logX(Y) is given by the tangent vector with the smallest: norm. Notice that both operators are point dependent where the dependence is made explicit with the subscript.