1. Technical Field
The present invention relates to image classification, and more particularly to object-centric spatial pooling for image classification.
2. Description of the Related Art
Image object recognition has been a major research direction in computer vision. Its goal is two-fold: deciding what objects are in an image (classification) and where these objects are in the image (localization). However, in practice, classification and localization are often treated separately. Object localization is generally deemed as a harder problem than image classification even when precise object location annotations are available during training. In the purely image classification setting, it may be seen as a detour to attempt to localize objects. As a result, current state-of-the-art image classification systems do not go through the trouble of inferring object location information.
Classification systems can be based on spatial pyramid matching (SPM), which pools low-level image features over pre-defined coarse spatial bins. However, there is room for improvement in the current implementations of pooling with SPM regarding the accuracy of the resultant classification.
FIG. 1 shows an example 100 of spatial pyramid matching (SPM) based pooling for image classification, in accordance with the prior art. For the sake of illustration, circles denote object-related local features, triangles denote background-related local features, and the numbers indicate the fraction of the respective local features in each pooling region. In the example 100, we show a first image 110 and a second image 120, both having a car as the object of interest. SPM based pooling results in inconsistent image features (as indicated by the fractions 111 and 121 corresponding to the first image 110 and the second image 120) when the object of interest (i.e., the car) appears in different locations within images, making it more difficult to learn an appearance model of the object.