Field of the Invention
The present invention relates to an object detection method, an object detection device for detecting a specific object and an image pickup device including the object detection device.
Description of the Related Art
Analysis of objects in images is important in the fields of image processing, computer vision, pattern recognition and the like and more and more attention has been attracted onto the detection of an object. Two steps are generally involved in object detection technique, that is, a training step and a detection step. During the training step, a classifier is obtained by training with several samples of an object. Then, during the detection step, the classifier thus obtained is used to detect the object.
Detection of a specific object such as a face, people, a car and etc has made great progress in recent years. If generic classifiers or object detectors which are trained with a large quantity of samples offline by using the above object detection technique are used to detect the specific object in any images or video sequences, it is likely to fail and often suffer high false alarms.
In this case, scene information is very important to improve the generic detector's discrimination and reduce false alarms. Recently, to overcome these problems, some scene modelling methods are proposed in which a scene model is created using specific scene information such as the object instances, background and context. Thus, more accuracy detection results can be gotten by said scene model, allowing adapting to the changing environment and said scene model being widely used for surveillance and tracking.
The main purpose of said scene model is try to get more accuracy detection results and thus said scene model is more effective classifier in corresponding specific scene. The existing scene models are characterized by:                Binary classifier: for distinguishing a specific object and a non-specific object;        Collecting both of positive samples (object used for training) and negative samples (specific scene without the object) repeatedly: for training and updating a binary classifier repeatedly.        
FIG. 1 shows a flowchart of an object detection method in the prior art with the main steps as follow:
1) Collecting positive samples S101: user drawing a window for an object in the preceding frame or frames of a video as a positive sample, or using a current object detector to detect a window for the object as a positive sample;
2) Collecting negative samples S102: collecting some windows as negative samples which are not user drawn windows or detected windows by the current object detector;
3) Learning a new classifier 5103: learning a new binary classifier which can distinguish the object from a specific scene more effectively using the collected positive and negative samples;
4) Object detection S104: detecting the object from sequent frames by said new binary classifier and updating said classifier by repeating the above steps according to the detection results until the resultant classifier has a false alarm lower than a certain threshold, as shown by the dotted line in FIG. 1. Such method can be used for tracking by object detection and only is used for videos or sequent frames.
For example, U.S. Pat. No. 8,385,632 proposes a method in which a trained generic classifier is adapted to detect an object from a specific scene. Since the specific scene is unknown when the generic classifier is trained using generic training data, it is likely to result in high false alarm in the case that the generic classifier is directly used to detect the object from an image comprising the specific scene, as shown in FIG. 1B of this document. Therefore, it is necessary to keep the information of the previous training examples on the one hand, and to collect repeatedly positive or negative samples related to the classification task with respect to the specific scene on the other hand so as to create a classifier specific to the specific scene based on the generic classifier repeatedly, as shown in FIG. 2 of this document. However, it is necessary for such method to keep the generic training data for the generic classifier while collecting new positive and negative samples to thereby update the generic classifier repeatedly with the generic training data as well as the collected positive and negative samples.
U.S. Pat. No. 7,526,101 proposes a method for tracking an object in a video. It treats object tracking as a binary classification problem. First, it trains in real time based on the acquired video a set of weak classifiers used for distinguishing the object and background. Second, it combines the set of weak classifiers into a strong classifier which can generate a confidence map for a frame so as to distinguish the object and background. However, in this method each weak classifier is trained based on the respective positive and negative samples in individual frames and in the case that the frames vary as a function of time it is necessary to train repeatedly new weak classifiers so as to replace the old ones of the set of weak classifiers. Accordingly, the strong classifier is updated to thereby adapt to the variation of frames with time.
CN patent publication No. 101216942A provides a background modelling method which enables online updates. But on the one hand this method has to update online and on the other hand this background model is not based on a classifier but based on a template or mask image, and is used to divide foreground and background image by frame subtraction.
Although the above method can improve detection accuracy in the case of a specific scene, but also has the following problem:                1) A new binary classifier is always created by positive samples, for example, samples given by user, samples of the detected result by a current object detector, and negative samples.        2) The positive and negative samples need to be in large quantity. Negative samples are easy to be collected from scene frames or videos, while positive samples are very difficult to be collected because good positive samples needs to satisfy many criteria, such as quantity, size, clarity, integrity, uniqueness and orientation and thus it is not possible to accurately and efficiently provide the required multiple positive samples by conventional ways. So the scene models used in the existing detection methods are only learned by a few of positive and negative samples firstly, and waits for update with more positive and negative samples in the future;        3) The scene model learned by a few positive and negative samples is always too weak to be used for object detection directly. So this scene model is only suitable to object tracking, i.e., detecting the target near the position of the target in last frame, and updating the scene model by detected target.        
It can be seen that there is an issue in the prior art in which the positive samples for training the classifiers have to be collected repeatedly in order to improve the accuracy of the specific object detection.