Automatically labeling image elements of a digital image is carried out in many applications. For example, to segment an image into a foreground region and a background region each image element is labeled as being part of either the foreground or background. Once this segmentation is achieved, the regions may be used in a variety of applications. For example, the foreground region may be used in an object recognition process to identify the object(s) in the image which may be of a visual scene. For example, the foreground region may depict a person standing in front of a landscape or sitting in an office environment (such as during a video conference). The background region may be replaced, for example, in an image editing application. Many other such examples exist where segmented images are used including in the fields of image editing, medical image processing and satellite image processing.
Other applications in which image elements of a digital image are automatically labeled include three-dimensional imagesegmentation. Here, 3D image elements (voxels) are labeled as being part of (or not being part of) an object. Other applications include image stitching whereby images are automatically joined together at a seam and image elements are labeled as being on a particular side of the seam so as to choose a seam which is least visible. Many other such applications exist in which image elements are labeled as having one of two possible labels (a binary labeling) or one of more than two possible labels. An example in which image elements are labeled as having one of tens or hundreds of possible labels is object classification whereby image elements are labeled as being from an object of a particular class (such as sky, building, person, animal, water).
The digital image may be a 2D image or a 3D image. For example, the 3D image may be obtained using a depth camera or z-camera. In the case of a 2D image the image elements may be pixels or groups of pixels. In the case of a 3D image the image elements may be voxels or groups of voxels.
The task of automatically labeling image elements of a digital image is complex and time consuming and yet many applications which use the results of the image labeling process require high quality results in real time. For example, video conferencing applications and image editing applications.
Previous automated image labeling systems have specified an energy function describing the quality of potential labelings of an image. An energy minimization process is then applied to find an optimal image labeling. However, this energy minimization process is typically time consuming and complex and often may become stuck in local optima, which in many cases correspond to poor solutions.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known image labeling systems.