1. Field of the Invention
The present invention relates to multimedia database and classification systems, and in particular to automatic classification and retrieval of multimedia files based on the features of the multimedia files.
2. Background Information
Automatic image classification has many important applications. Large image databases or collections require good indexing mechanisms so that images can be categorized effectively, browsed efficiently, and retrieved quickly. Conventional systems store and retrieve specific information from a database using, for example, descriptive information regarding the image file, such as file creation date, file name, file extension and the like. This form of image classification is not significantly different from the classification of any other digital information.
By relying on the file information, only cursory information can be obtained about the file and nothing at all specifically related to the image. For example, an image file could have a name that has no relation to the features or content of the image, such as a black and white image could have the file name “color_image”. Other systems provide classification based on the content of the images, such as flowers, dogs, and the like. In practice, this is usually done by keyword annotation, which is a laborious task.
The amount of multimedia information available today due to the evolution of the internet, low-cost devices (e.g., digital video cameras, digital cameras, video capture cards, scanners and the like) to generate multimedia content, and low-cost storage (e.g., hard disks, CDs, and the like) increases the need to classify and retrieve relevant multimedia data efficiently. Unlike text-based retrieval, where keywords are successfully used to index into documents, multimedia data retrieval has no easily accessed indexing feature.
One approach to navigating through a collection of images for the purpose of image retrieval is disclosed by Yossi, R., “Perceptual Metrics for Image Database Navigation,” PHD Dissertation, Stanford University May 1999, which is incorporated herein by reference in its entirety. The appearance of an image is summarized by distributions of color or texture features, and a metric is defined between any two such distributions. This metric, called the “Earth Mover's Distance” (EMD), represents the least amount of work that is needed to rearrange the images from one distribution to the other. The EMD measures perceptual dissimilarity which is desirable for image retrieval. Multi-Dimensional Scaling (MDS) is employed to embed a group of images as points in a 2- or 3-dimensional (2D or 3D) Euclidean space so that their distances reflect the image dissimilarities. This structure allows the user to better understand the result of a database query and to refine the query. The user can iteratively repeat the process to zoom into the portion of the image space of interest.
Feature extraction is a key component to generating systems that can organize multimedia files based on their content. References that address image feature extraction include the following articles. Niblack, et al., “The QBIC Project: Querying Images by Content Using Color, Texture, and Shape,” Proc. of SPIE, Storage and Retrieval for Image and Video Databases, Vol. 1908, February 1993, San Jose, pp. 173-187, which describes using color histograms for image distance measurement and is hereby incorporated by reference. M. J. Swain and D. H. Ballard, “Color Indexing,” International Journal of Computer Vision, Vol. 7, No. 1, pp. 11-32, 1991, which describes histogram intersection techniques and is hereby incorporated by reference. G. Pass and R. Zabih, “Histogram Refinement for Content-based Image Retrieval,” IEEE Workshop on Applications of Computer Vision, pp. 96-102, 1996, which describes a color coherence vector and is hereby incorporated by reference. J. Huang, et al., “Image Indexing Using Color Correlogram,” IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp. 762-768, Puerto Rico, June 1997, which describes the use of color correlograms as features in indexing images and is hereby incorporated by reference. H. Tamura, S. Mori, and T. Yamawaki, “Texture Features Corresponding to Visual Perception,” IEEE Trans. On Systems, Man, and Cybernetics, vol. 8, no. 6, June 1978, which describes the use of texture as features in images processing and is hereby incorporated by reference. M. K. Hu, “Visual Pattern Recognition by Moment Invariants,” IEEE computer Society, Los Angeles, Calif., 1977, which describes the use of moment invariants as features in images processing and is hereby incorporated by reference. J. Mao and A. K. Jain, “Texture Classification and Segmentation Using Multiresolution Simultaneous Autoregressive Models,” Pattern Recognition, Vol. 25, No. 2, pp. 173-188, 1992, which describes Multiresolution Simultaneous Autoregressive Model (MRSAR) texture features and is hereby incorporated by reference.
Since visualizing and retrieving large databases of multimedia files are complex tasks, it is desired to have a method for visualizing and retrieving data files that provides a distance calculation that starts at a coarse level of feature differences and progressively increases to a fine level of feature differences as the number of data files displayed is decreased. Additionally, it is desired to have an interactive real time system that allows a user to interactively select portions of the displayed data files to search and retrieve data files from large databases.