Content-based coding and manipulation of video is one of the core functionalities supported by the emerging international Moving Picture Experts Group (MPEG), standard for coding audiovisual signals, and specifically, the MPEG4 standard. This functionality requires representation and coding of contour and texture of arbitrarily shaped image objects.
Known methods of shape recognition include methods such as that used in the WebSEEk program, which searches a collection of computer files containing image and video data by file type, i.e., GIF, MPEG, and text references. The system then decompresses the files, analyses the contents of the files for colors and/or texture. An icon is formed which includes a miniature version of the image, which icons may be examined for specific types of images.
Another shape recognition program is known as Query by Image Content (QBIC), which is an IBM.RTM. product. QBIC examines an image file for color, contrast, coarseness and directionallity. Only limited shape information is made available during image analysis, and the image is analyzed as a whole. The known methods are quite slow, are resource intensive, and do not readily enable searching by shape criteria. The method of the invention are intended to overcome these limitations.
In the MPEG4 document, in effect as of the filing date of this application, block-based, spatial-resolution scalable shape coding was implemented using a shape pyramid in the MPEG4-Shape Coding Core Experiments (SCCE). Given a binary bit map representation and image object shape, a three-layer shape pyramid is formed for each macro-block. This technique is explained in connection with FIG. 1 herein. The shape pyramid 10 includes, in this representation, three layers. The base (coarsest) layer (Layer 0) 12 is formed by averaging a window of 4.times.4 pixels, and thresholding the result in order to clip it to either 0 or 1. The resulting block is 4.times.4. The next layer (Layer 1) 14 is formed by repeating this process using a window of 2.times.2 pixels, resulting in a finer resolution layer of 8.times.8 blocks. The finest layer (Layer 2) 16 has the original macroblock resolution. Macroblocks in the base layer are coded by themselves. Macroblocks in Layer 1 are coded differentially in reference to macroblocks in Layer 0. That is, macroblocks in the base layer are upsampled and their difference (residual) from co-located macroblocks of Layer 1 are coded. Similarly, Layer 2 is coded differentially in reference to Layer 1 and so on. Residually coded layers (Layer 1, 2, . . . ) are referred to as "enhancement layers." Coding the macroblock and residual macroblock data may be performed in various efficient ways. Two methods were proposed in MPEG4. The known methods, however, sample a layer or an image at the same resolution over the entire layer or image. Some parts of an image are more important than others, and warrant a higher resolution. Hierarchical shape pyramid has been discussed in MPEG4 for use with spatial scalability, but has not been discussed in connection with content scalability.