In computer image analysis such as intelligent transportation systems, a common task is to classify street scenes in a captured image. This task often involves detecting road, other vehicles and pedestrians to alert a user of the intelligent transportation system in potentially dangerous situations. Detection of objects of interest in a captured image generally requires segmenting the image into regions of interest and/or further segmenting the regions of interest into objects of interest.
Scene segmentation has been an active area of research and has a wide range of applications to real world problems, such as applications in robotics and automotive systems. One conventional scene segmentation method employs discretized representations, such as codebooks of features or texton images, which model a whole image or specific regions of the image with or without spatial context of the image. Textons of an input image are discretized texture words, which are learned by applying a filter bank to the input image and clustering the output of the filter bank. The problem with this method is that it can only address scene segmentation at image level. Thus, it face challenges of detecting and localizing objects especially small size objects in an image, where image level features and statistics are often insufficient.
Another conventional scene segmentation method uses texture-layout features of an input image to boost feature selections that act on textons. An example of this conventional scene segmentation method uses a semantic texton forest for both textons creation and for textons classification. Since the number of such features is very large, training a scene segmentation engine used in this method is very slow and the performance of such scene segmentation deteriorates with the increasing size of training dataset and variation in object classes in the training dataset.