Field of the Invention
The present invention relates to a field of image processing, and more specifically, to a method and apparatus for classifying pixels in an input image and image processing system.
Description of the Related Art
Segmenting an entire image into distinct recognizable regions is a central challenge in computer vision, which has received increasing attention in recent years. Unlike object recognition methods which can find a particular object, multi-class image segmentation methods can classify all pixels in an image, and then concurrently recognize multi-class objects based on the classification. If an image is intended to be segmented accurately with each pixel belonging to the class it actually belongs to, each pixel in the image need to be correctly classified into one of several predetermined classes.
Usually, the multi-class segmentation method is based on either pixel or “super-pixel”. For a pixel based method, local features within the neighbourhood of each pixel are extracted, and the pixel is classified mainly according to the extracted features. For a super-pixel based method, the processing procedures are similar to the pixel based method, with treating each super-pixel obtained by performing over-segment on an input image as a pixel used in the pixel based method. That is, in the super-pixel based method, the input image is first over-segmented into some super-pixels, and then local features within the super-pixels are extracted to be used for classifying corresponding super-pixel.
Take the super-pixel based method as an example. In such a method, a multi-class classifier can be used to classify each super-pixel into different predetermined classes according to the extracted features of the super-pixel. For each super-pixel, the multi-class classifier calculates a confidence of the super-pixel belonging to each predetermined class. Then, the super-pixel is classified into a class corresponding to the maximum within the confidence of the super-pixel among all the predetermined classes.
For example, when segmenting an image composed of grass, a human, trees, sky and mountain in order to recognize multi-class objects, the expected output is that each pixel in this image is classified into its real class which is consisted of a “grass” class, a “human” class, a “tree” class, a “sky” class and a “mountain” class. That is, in the ideal resulted image, the true class label will be assigned to each pixel.
In the above image, all the objects can be divided into two sets. One set is “things” which includes the “human” class, and the other set is “stuff” which includes the “grass” class, the “tree” class, the “sky” class and the “mountain” class. The “thing” usually has a distinct size and shape, which may also include other prominent objects such as cars, pedestrians, bicycles, and houses. The “stuff” is a homogeneous or reoccurring pattern of fine-scale properties, but has no specific spatial extent or shape, which may also include other objects such as a road. The distinction between the two sets can also be interpreted in terms of localization. Specifically, the “thing” can be easily localized by a bounding box that can limit the region where the “thing” appears, but the “stuff” cannot.
Since a “stuff” object has a pattern in fine scale, its pattern can usually be captured by its local features and it can be recognized correctly. However, as for a “thing” object, larger scale information is needed to recognize it correctly. The local features within a limited sight cannot recognize it correctly. For example, in the an image, if one of the super-pixels (represented as super-pixel 1) gotten by an over-segmentation method is a skirt region (the skirt region may have a same color and present a textural property) belonging to the human object that should be classified into the “human” class, and another super-pixel (represented as super-pixel 2) appears within a grassland that should be classified into the “grass” class, when the local features extracted in each super-pixel are used by the multi-class classifier, it may be hard for the multi-class classifier to differentiate the super-pixel 1 from the super-pixel 2 because the two super-pixels may have similar local features.
Therefore, with the conventional multi-class segmentation methods, it is often difficult to differentiate some “thing” objects from “stuff” objects, and a poor performance of classification is presented. Some pixels in an image may be wrongly classified into an inaccurate class, and thus deteriorating multi-class segmentation.