The present invention relates to image parsing.
Typical natural images contain multiple regions with each image region being a set of pixels grouped based on homogeneity in terms of location, appearance and smoothness constraint. The image parsing process gives object labels to image regions so that the most probable interpretation of the input image can be achieved. It also provides information such as shape (where is its boundary), semantics (what is the probability of the region belonging to each object class) and context (who are neighboring regions). Image parsing functionality is one of the most important features in the human visual system (HVS) because it provides necessary support to higher-level understanding of the physical world by human brain. The image parsing process gives labels to image regions, as well as information including shape, semantics and context. Although it is one of the most important features in the human visual system, automatic image parsing using computer vision techniques remains difficult due to computational issues.
Traditional computer vision techniques regard image classification, detection and segmentation as separate tasks and have developed different approaches for each respective task. However, apparently good results from these tasks can be mutually beneficial to each other, and cognitive studies have shown that human visual system performs these tasks simultaneously, therefore a joint approach to the three tasks is more appealing and effective.
In parallel, Dynamic Programming (DP) is a well studied tool for solving sequential decision problems in an efficient way. It performs global optimization by locally optimizing a sub-problem. One typical scenario where DP has been extensively used is to find the optimal sequence of a fix number of moves, starting from point i and ending at point j, with associated cost φ(i, j). However, applying DP algorithm to the searching problem in image parsing opens to two challenges: First, due to the nature of image, the topology of image units is more complicated than a sequential connection; Second, due to the existence of multi-mode, simply taking the top-N solutions by DP as the hypotheses is not feasible, because these hypotheses are usually too similar to each other as observed in our experiment, which means that they fall into the same mode.