1. Field of the Invention
The present invention relates to the field of image processing and pattern recognition, and more specifically, to a method and apparatus for training a classifier to perform object detection.
2. Description of the Related Art
With the development of computer image processing technology and the wide application of the principle of computer vision, it has become more and more popular to locate an object in real time from images and videos through object detection technology. Object detection technology has a wide practical value in applications, such as smart terminal devices, smart traffic systems, smart monitoring systems, or even in military object detection.
In the field of object detection, classifiers trained by the one-class methods are widely employed. As described in “Network constraints and multi-objective optimization for one-class classification,” Moya, M and Hush, D. (Neural Networks, 9(3):463-474. doi: 10.1016/0893-6080(95)00120-4, 1996), in a one-class classifier, through learning from a training set containing merely one class of objects, that class of objects can be distinguished from all of other possible objects. For example, classifiers targeted to face/cat/dog can be embedded in cameras.
Unfortunately, such existing one-class classifiers cannot meet the requirements of consumers more and more. Taking cameras as an example, a user tends to regularly take pictures for a certain object, such as his pet. This means that, instead of a classifier like a conventional one, that is, a classifier merely aimed to a certain class of objects such as face/cat/dog, a classifier is desired by such a user, which is capable of learning appearance features of an object specified by the user himself (such as, his pet). For example, a user may want to focus automatically on his pet when he is raising his camera or want to find photos about his pet from all photos taken by his camera.
Currently, most of the existing object detection products rely on the collection of sufficient samples to obtain an appropriately trained classifier, which is then provided in those products for achieving object location. However, in some practical applications, it may be difficult to collect enough samples to train a classifier. For example, when tracing a specific vehicle through a traffic monitoring system, there may be very few prior samples about the specific vehicle, or even only one sample available. Further, in customer products, it is not impractical to simply rely on users to collect plenty of samples, which may lead to poor user experience.
Thus, an object detection method is desired, which: (1) does not rely on any prior knowledge, because the number of possible object categories is so huge, and their distributions may obey the long-tail theory, it is virtually impossible to prepare previously-learnt dictionaries which cover those possible object categories; (2) is capable of performing detection using only one or several samples, while being able to handle appearance variances of an object at the same time, such as lighting, view point, deformation, blurring, rotation, etc.; (3) is distinctive enough to separate an object from all of other objects of the same category, for example, capable of distinguishing a dog of a user from other users' dogs.
Object detection method in the prior art can not meet the above requirements. For example, a concept of “attribute” is disclosed in V. Ferrari and A. Zisserman, “Learning Visual Attributes” (In NIPS, 2008), but it requires end users to identify object attributes.
In L. Fei-Fei, R. Fergus and P. Perona “A bayesian approach to unsupervised one-shot learning of object categories” (In ICCV, pages 1134-1141, 2003), a one shot learning method is disclosed. In M. Lew “Content-based Multimedia Information Retrieval: State of the Art and Challenges” (ACM Trans. MCCA, 2006), and J. Eakins and M. Graham “Content-based Image Retrieval” (University of Northumbria at Newcastle), a content-based image retrieval method is disclosed, both of which do not have enough accuracy to distinguish an object from other objects of the same category.
In Hae Jong Seo and Peyman Milanfar, “Training-Free Generic Object Detection Using Locally Adaptive Regression Kernels” (IEEE Trans. PAMI, vol. 32, no. 9, pp. 1688-1704, 2010), a training-free LARK based detection method is disclosed, which however has no rotation invariance and poor intra-class discrimination.
SIFT/SURF based local points matching methods are disclosed in Lowe, David G, “Object recognition from local scale-invariant features” (ICCV. pp. 1150-1157, 1999), and H. Bay, A. Ess, T. Tuytelaars and L. V. Gool, “SURF: Speeded Up Robust Features” (CVIU, pp. 346-359, 2008). In E. Nowak, F. Jurie and B. Triggs, “Sampling Strategies for Bag-of-Features Image Classification” (ECCV, 2006), a BOW/Part-based model is disclosed. Those methods are not good at processing very small target and handling non-rigid object distortions.
Various methods in the prior art as described above cannot provide satisfied detection performance with fewer samples. Thus, a method and apparatus capable of realizing object detection with high robustness and high discrimination using merely fewer samples is highly desirable.