Modern imaging devices are capable of generating vast amounts of image data in the form of two-dimensional arrays of samples (known as pixels) of some measurable quantity. Examples of directly measurable image data include luminance and chrominance of reflected light (from optical cameras), range or distance from some reference point to the imaged points (from active range sensors), or density (from tomographic scanners). Moreover, many quantities can be derived from the raw image data. Such quantities may be referred to as metadata, this being data that is used to describe other data. Examples of such “metadata” quantities include range (from passive, is optical range sensors) and motion (from multiple images of dynamic scenes).
The sheer volume of image data necessitates some kind of automatic analysis of content in most applications. An important step in analysing the content of an imaged scene is to partition the image into disjoint segments corresponding to semantically meaningful objects. Because human expectation is that real world objects are in some sense compact and coherent, each segment of the partitioned image consists of a region of adjacent pixels over which some property of the data (image data, metadata, or both) is uniform. Many approaches to this task of segmentation have been tried. One that has met with some success is region merging. In this paradigm, each pixel is initially labelled as its own unique region. Adjacent regions are then compared using some similarity criterion and merged if they are sufficiently similar In this way small regions take shape and are gradually built into larger ones. It may be shown that region merging is a practical approximate solution to a variational formulation of the image segmentation problem. In this formulation, the “best” segmentation is expressed as the global minimum of some cost functional defined over the space of all possible segmentations of an image. An advantage of region merging methods (as compared with, for example, edge-based methods) is the adaptability of region merging to handle multichannel image data, ie. data which is vector-valued at each pixel. For example, in colour images the vector components might be the red, green, and blue intensities. This facility makes region merging techniques suitable for fusing multiple sources of data and metadata to produce a single segmentation. In this way range and motion information may be integrated with colour to provide an analysis that colour data alone cannot, This is of particular interest when the images are of complex, dynamic scenes. An example of such is disclosed in the paper “Region-based Representation of Image and Video: Segmentation Tools for Multimedia Services”, P. Salembier, F. Marques; IEEE Transactions on Circuits and Systems for Video Technology Vol. 9, No. 8, December 1999, pages 1147-1169.
Traditional region merging has dealt with the definition of segmentation functionals and/or similarity criteria. Most successful cost functionals have two components: a model fitting cost and a model complexity cost. The model fitting cost encourages a proliferation of regions, while the complexity cost encourages few regions The functional must therefore balance the two components to achieve a reasonable result. The most soundly based model fitting costs use statistically valid definitions such as residuals. This provides optimal handling of data or metadata which is subject to spatially varying uncertainty. This situation often arises from metadata such as range obtained by passive optical means, when the certainty of the range estimate depends strongly on the underlying image texture.
Traditional statistical region merging has assumed all channels have independent, identically distributed uncertainties. Instances where the uncertainties of each channel are unequal and/or correlated between channels have not been addressed. However this will be the case when fusing pixel data and derived metadata such as range. A similar situation also occurs when segmenting on estimated motion vector images, in which the uncertainties not only vary over the image, but are correlated between horizontal and vertical components,
Another difficulty with automatic segmentation by region merging is deciding when to halt the merging process. Some implementations have required a predetermined “schedule” of thresholds to govern the merging process and converge to the segmentation which minimises the cost functional. Others have removed the need for a schedule, but still require an arbitrary threshold. This threshold is related to the weighting of fitting error and model complexity in the final cost functional. The use of a predetermined arbitrary threshold means the segmentation algorithm is unable to adapt to different types of image without substantial operator effort.