1. Field of the Invention
The present invention generally relates to image detection. In particular, the present invention relates to method and apparatus for object classifier generation, and further relates to method and apparatus for detecting an object in an image.
2. Description of the Related Art
Object detection for an image, such as, human detection, has very important applications in video surveillance, content-based image/video retrieval, video annotation, and assisted living. There is a vast literature on techniques of human detection. Most of them focus on the generation of a classifier, which is substantially essential and critical for object detection. Generally, a classifier usually relates to one kind of object and is used for detecting whether such kind of object exists in an image to be detected.
One thread, among the successful approaches, has been to build on the pioneering work of Viola and Jones for face detection [Document 1]. In Document 1, Haar-like features are calculated via the integral image and a novel cascade structure of classifiers is learned by Adaboost. Adaboost is well known in the art and provides an effective learning process and strong bounds on generalized performance [Document 6]. Such learning-based methods have come to be dominant currently; key issues here are the features and the learning algorithms that are used.
In 2005, Dalal proposed the normalized histogram of oriented gradients (HOG) descriptor for human detection [Document 2], as shown in FIG. 1A, which illustrates histogram of oriented gradients in the prior art as described in Document 2. Each detection window is divided into cells having a size of 8*8 pixels and each group of 2*2 cells is integrated into a block, so blocks overlap with each other. Each cell consists of a 9-bin HOG and each block contains a concatenated vector of all its cells. Each block is thus represented by a 36-D feature vector that is normalized to an L2 unit length. Each 64*128 sample image is represented by 7*15 blocks, giving a total of 3780 features per detection window, which is usually expressed as a feature vector f=[ . . . , . . . , . . . ]. These features are then used to train a linear SVM classifier. An overview of the method disclosed in Document 2 is shown in FIG. 1B. The HOG features give very good performance for human detection.
In 2006, Zhu calculated the HOG features via integral images and integrated the cascade structure classifier to speed up the method disclosed in Document 2 [Document 3]. The 36-D block feature vector is normalized to a L1 unit length and then used to build a SVM weaker classifier. The framework of the method disclosed in Document 3 is shown in FIG. 1C. The method disclosed in Document greatly improves the detection speed while maintaining an accuracy level similar to the method disclosed in Document 3.
However in both methods disclosed in Document 2 and Document 3, a local contrast normalization step within each block of HOG is critical for good performance. However, many division operations in the normalization step will significantly increase the computation overhead, especially for the embedding system.
Recently, some comparison features are proposed for human detection such as Associated Pairing Comparison Features (APCF) [Document 4] and Joint Ranking of Granules Features (JROG) [Document 5].
In Document 4, APCF is based on simple pairing comparison of color and gradient orientation in granular space which is called PCC and PCG respectively, and several PCC or PCG features are associated to form an APCF feature. APCF features are then used to build a cascade structure classifier, as shown in FIG. 1D, in which the left portion illustrates pairing comparison of color, and the right portion illustrates pairing comparison of gradient. Due to rich pairs of granules in the granular space such simple APCF features achieve more accurate detection results than the method disclosed in Document 3. The detection speed is similar to the method disclosed in Document 3.
JROG is a simplified form of APCF features. In Document 5, JROG features are used to build a full body detector and several part detectors to keep a high detection accuracy which may decrease due to the simplification of JROG. Eventually, the method disclosed in Document 5 achieves comparable detection accuracy and higher efficiency than the method disclosed in Document 4. The overview of both methods disclosed in Documents 4 and 5 can be illustrated as FIG. 1E.
One advantage of above assembled binary comparison features is their simplicity in form. No normalization step is required during calculation. Another advantage is that abundant granules encode richer information than other features such as Haar-like features. However their comparisons are between granular intensities or granular gradient orientations, instead of the statistics of gradients within image patches (for example, in a form of HOG). From the relatively successful performance of HOG we can see that the statistics of gradients within image patches are very discriminative for human detection. At the same time, since a granule is usually a pixel or has a square shape, the width and height of a granule are the same, which limits the ability to find useful patterns.
In view of the above, there still needs a method and apparatus capable of obtaining a more discriminative feature with higher computation speed.
Furthermore, there still need a method and apparatus capable of efficiently and accurately detecting object in an image.
[Cited Documents]
1. P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. IEEE CVPR, 2001.
2. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. IEEE CVPR, 2005.
3. Q. Zhu, S. Avidan, M. Yeh, K. Cheng. Fast Human detection using a cascade of histograms of oriented gradients. IEEE CVPR, 2006.
4. G. Duan, C. Huang, H. Ai, and S. Lao. Boosting associated pairing comparison features for pedestrian detection. Ninth IEEE International Workshop on Visual Surveillance, 2009.
5. C. Huang, R. Nevatia. High performance object detection by collaborative learning of joint ranking of granules features. IEEE CVPR, 2010.
6. Y. Freund, R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Second European Conference on Computational Learning Theory, 1995.