Semantic labeling is the task of generating pixel-wise labels for an acquired image in terms of semantic concepts such as tree, road, sky, water, and foreground objects. The semantic labeling problem can be formulated as a problem of mapping a set of nodes arranged on a 2D pixel grid representing the acquired image to their corresponding semantic concepts.
Semantic labeling can be achieved via a two step process: 1) feature extraction and 2) inference. Feature extraction retrieves descriptive information useful for semantic labeling under varying illumination and view-points. Generally, the features are colors, textures, or gradients. The features can be extracted from a local patch around each pixel. The inference predicts the labels of the pixels using the extracted features. The rich diversity in the appearance of even simple semantic concepts, such as sky, water, trees, or grass, makes automatic semantic labeling difficult.
Semantic labeling can use model-based methods or non-parametric methods for inference. Model-based methods learn the appearance of semantic categories and relations among the categories using a parametric model. Conditional random fields (CRF) can be used to combine unary potentials devised through visual features extracted from superpixels with the neighborhood constraints. The differences among various CRF models are mainly in terms of the visual features, unary potentials and the structure of the CRF.
Non-parametric methods find images that are similar to the acquired image from a pre-labeled image database. The pixel labels of the found images are then copied to the acquired image, according to the similarity of the pixels. Because the variations in images of natural scenes are large, it is difficult to cover an entire space of conceptual variation with a reasonable size database, which limits the accuracy. On the other extreme, a large database would require a large retrieval-time, which limits the scalability of these methods.
In U.S. 20130343641, a neural network is trained to predict a probability that a pixel belongs to an object class by minimizing an objective function that a predicted posterior probability term for pixels in labeled aerial images differ from true labels for the pixels. That network only performs per pixel classification without propagating any information.
U.S. Pat. No. 7,460,709 describes a multi-label image segmentation method where edge weights and pixel based color potentials are used to label an image. That method labels by solving an optimization problem and by selecting a maximum of a potential function making the method slow.
U.S. Pat. No. 8,170,330 describes a learning based segmentation and labeling framework for processing tissue images. A classifier function for each label is evaluated for every pixel in the image. The functions operate separately on features originating from different pixels and do not use the context information.