1. Field of Invention
The present patent document is directed towards systems and methods for object detection. More particularly, the present patent document is directed towards systems and methods for generating and using object detection models for recognizing objects in an image (video or still image).
2. Description of the Related Art
Detecting objects from static images is an important and yet highly challenging task and has attracted much interest of computer vision researchers in the recent decades. Some of the difficulties with object detection originate from various aspects, including large intra-class appearance variation, objects deformation, perspective distortion and alignment issues caused by view point change, and the categorical inconsistency between visual similarity and functionality.
According to the recent results of a standards-making PASCAL grand challenge, the detection approach based on sliding window classifiers are presently the predominant method. Such methods extract image features in each scan window and classify the features to determine the confidence of the presence of the target object. They are further enriched to incorporate sub-part models of the target objects and the confidences on sub-parts are assembled to improve detection of the whole objects.
One key disadvantage of these approaches is that only the information inside each local scanning window is used: joint information between scanning windows or information out of the scanning window are either thrown away or heuristically exploited through post-processing procedures such as non-maximum suppression. Naturally, to improve detection accuracy, context in the neighborhood of each scan window can provide rich information and should be explored. For example, a scanning window in a pathway region is more likely to be a true detection of human than the one inside a water region. There have been some efforts on utilizing contextual information for object detection and a variety of valuable approaches have been proposed. High level image contexts, such as semantic context, image statistics, and three-dimensional (3D) geometric context, are used as well as low level image contexts, including local pixel context and shape context.
Besides utilizing context information from the original image directly, other lines of work including Spatial Boost, Auto-Context, and their extensions integrate the classifier responses from nearby background pixels to help determine the target pixels of interest. These works have been applied successfully to solve problems such as image segmentation and body pose estimation. Contextual information directly from the responses of multiple object detectors has also been explored. In other approaches, the co-occurrence information among different object categories is extracted to improve the performance in various classification tasks. Such methods require multiple base object classifiers and generally necessitate a fusion classifier to incorporate the co-occurrence information, making them expensive and sensitive to the performance of individual base classifiers.
Thus, prior context-related approaches either required multiple models for different object of interests or did not consider higher order information when using models for an object of interest.
Accordingly, systems and methods are needed that better perform object detection using contextual information.