Large image and video sequence databases are used in a number of multimedia applications in fields such as entertainment, business, art, engineering, and science. Retrieving images or parts of sequences based on their content, has become an important operation.
Shape analysis methods play an important role in systems for object recognition, matching, registration, and analysis. However, retrieval by shape is still considered to be one of the most difficult aspects of content-based search.
The key to multimedia data retrieval is the following: the types of features of the multimedia data to be considered and how to express these features and how to compare between features.
A common problem in shape analysis research is how to judge the quality of a shape description/matching method. Not all methods are appropriate for all kinds of shapes and every type of application. Generally, a useful shape analysis scheme should satisfy the following conditions:                Robustness to transformations—the result of analysis must be invariant to translation, rotation, and scaling, as well as the starting point used in defining the boundary sequence; this is required because these transformations, by definition, do not change the shape of the object,        Feature extraction efficiency—feature vectors (descriptors) should be computed efficiently,        Feature matching efficiency—since matching is typically performed on-line, the distance metric must require a very small computational cost,        Robustness to deformations—the result of analysis must be robust to spatial noise, introduced by a segmentation process or due to small shape deformations,        Correspond to human judgement—a shape similarity/dissimilarity measure should correspond as much as possible to a human's judgement.        
Known methods of representing shapes include the descriptors adopted by MPEG-7; Zernike moments [A. Khotanzan and Y. H. Hong. Invariant image recognition by zernike moments. IEEE Trans. PAMI, 12:489-497, 1990] and CSS [Farzin Mokhtarian, Sadegh Abbasi and Josef Kittler. Robust and Efficient Shape Indexing through Curvature Scale Space. British Machine Vision Conference, 1996]. For the Zernike moment shape descriptor, Zernike basis functions are defined for a variety of shapes in order to investigate the shape of an object within an image. Then, an image of fixed size is projected over the basis functions, and the resultant values are used as the shape descriptors. For the curvature scale space descriptor, the contour of an object is extracted and changes of curvature points along the contour are expressed in a scaled space. Then, the locations with respect to the peak values are expressed as a z-dimensional vector.
The Zernike moments and CSS descriptors have obvious advantages such as very fast feature matching and compact representation. Unfortunately, the majority of the compact shape descriptors (i.e. Zernike moments) are not robust to shape deformations. Others, like CSS, are robust but matching of such descriptors results in many false positives. The retrieval accuracy of the CSS method can be sometimes poor, especially for curves, which have a small number of concavities or convexities. In particular, this representation cannot distinguish between various convex curves. Another disadvantage of the compact descriptors is that their extraction is usually computationally expensive.
Although it is not a problem for creating databases (feature extraction is performed off-line), this makes it difficult (or even impossible) to use them for fast on-line comparison of two shapes provided as binary masks.