Object component semantic information is useful for many visual tasks, for example, image classification when the differences between categories are subtle, and fine-grained motion detection. At the same time, component semantic information is also essential for many robotic tasks involving interaction. For example, when a robot needs to open a hood for repairing a car, the hood is required to be identified correctly. When a robot needs to sit on a chair, a surface of the chair is required to be identified correctly. In summary, component semantic information can be used to accomplish many vision-related tasks.
At present, the method for image object component-level semantic segmentation generally includes three steps as follows: 1) extracting features of each pixel point; 2) acquiring an initial probability distribution belonging to a respective semantic category label of each pixel via a classifier or other models based on the extracted features; and 3) constructing a conditional random field or a Markov random field to optimize the initial probability distribution, thereby acquiring a final semantic category information of each pixel point.
However, during image photographing process, when lighting conditions as well as relative poses or angles of an object and a camera are different, or, for the same kinds of objects, when surface materials and texture properties of the objects are different, the existing method for image object component-level semantic segmentation cannot segment the object component of the image correctly.