1 Field of the Invention
The present invention relates generally to image and video processing and more particularly to determining the complexity of an image.
2 Description of the Related Art
In processing digital images or video for a variety of purposes, the notion of image complexity arises. While an encompassing definition of image complexity is elusive, it is conventionally described as a measure of the minimum description needed to capture the content of an image. As such, the concept is related to the idea of information content of an image, in the sense of information theory introduced by Claude Shannon, and yet more general.
The need to measure image complexity arises in a variety of contexts. For instance, an accurate image complexity metric can serve an important role in efficient image segmentation, allocation of bits during video compression, object tracking and computer vision, and automatic target recognition in military applications. Metrics proposed in the background art have included entropy, composite statistics such as standard deviation, L2 error relative to the same image passed through a smoothing filter, edge counts, and gradient measures, among others. No metric has achieved acceptance as a ubiquitous standard, and most present a trade-off between computational ease and accuracy.
Image segmentation, in particular, presents a significant demand for an accurate complexity metric. Image segmentation refers to the process of subdividing an image into smaller regions or segments, preferably such that these segments correspond to individual objects or parts of objects depicted in the image. Segmentation can serve a variety of purposes, for example in identifying objects, in extracting image features from a scene, or in eliminating temporal redundancy to compress the data in a video sequence. The latter purpose is of particular importance as the rapid growth of digital media in the marketplace and the enormous size of typical raw video data have prompted a need to develop more efficient and more accurate methods for compressing these large video files. Background on the importance of video compression and the development of more efficient techniques can be found in the commonly assigned application referenced above as ‘Prakash I’.
Temporal redundancy in video data is typically reduced by encoding a subset of frames as reference frames and by attempting to describe interspersed frames using predictions based on one or more of the reference frames. Since within a scene many of the same objects appear across multiple frames, the interspersed predicted frames can to a great extent be “built up” from constituent objects of one or more reference frame. Because motion may occur between frames, it becomes necessary to determine how much various objects are displaced between a reference frame and the predicted frame. The most common existing technologies for video compression, including the MPEG-1, MPEG-2, and MPEG-4 standards, break each predicted frame into a grid of square blocks (generally 16×16 pixels or 8×8 pixels) and search for square blocks in a reference frame that provide the best match for each of these blocks. In general, these blocks do not correspond to actual objects that move within the scene. As a result, block matches tend to be imprecise and motion is crudely approximated, requiring block-based algorithms to expend many additional bits to correct their inaccurate predictions. Compression strategies that subdivide images into segments representing actual objects, of arbitrary shape, allow for more faithful matching between frames and thus more accurate predictions. Higher compression ratios are thus possible. In fact, when accurate object-based segmentation is performed, the average number of segments needed to describe each frame for most video sequences is smaller than the number of small square blocks used in block-based algorithms, reducing the amount of motion information needed to encode the video. However, achieving accurate segmentation is a non-trivial task. A successful segmentation strategy is discussed in the commonly assigned application referenced above as ‘Prakash II’.
A variety of other segmentation techniques have been contemplated in the academic literature. For example, S. L. Horowitz and T. Pavlidis present a split-and-merge method in “Picture Segmentation by a Directed Split-and-Merge Procedure,” In Proc. 2nd Int. Joint Conf. on Pattern Recognition, Copenhagen, pp. 424-433, 1974. An image is subdivided via a quadtree structure when areas are not sufficiently homogeneous, and a merging step is alternately introduced to correct against over-splitting. K. Haris, S. N. Efstratiadis, N. Maglaveras, and A. K. Katsaggelos propose a hybrid technique using watershed subdivision followed by a merging step in “Hybrid Image Segmentation Using Watersheds and Fast Region Merging,” IEEE Trans. on Image Proc., Vol. 7, No. 12, pp. 1684-1698. For further information, more complete overviews of the main strategies for segmentation, including histogram techniques, edge-based techniques, region-based techniques, and hybrid methods, may be found in both ‘Prakash II’ and the K. Haris et al paper.
A fundamental issue that arises in image segmentation is how to determine how finely an image should be subdivided. For instance, the image may consist of a garden, and within the garden a plurality of plants, and within each plant a variety of flowers and leaves, and within each flower a plurality of petals, and within each petal and leaf a texture consisting of color variation, and so on. The objects contained in this image can be described at a number of levels. A successful segmentation strategy should identify distinct objects but should not subdivide the image so finely that no color or texture variations within segments are tolerated (otherwise the goal of efficient video compression, for example, may be undermined). Aside from the problem of scaling, further difficulties are presented by the fact that different images have different lighting levels, different color ranges, different contrast levels, and so on. Subtle color changes in one image sequence may demarcate distinct objects that move differently, while another sequence may consist of a few large objects, each textured with broad color fluctuations. Training a segmentation algorithm to automatically determine the threshold for subdivision given these variations in image characteristics presents a dilemma: it is hard to determine a threshold without knowing the number of objects in an image, and it is hard to determine the number of objects in an image without an accurate threshold. The application referenced above as ‘Ratner I’ discusses the importance of thresholding in the case of an edge-based segmentation strategy.