This invention relates generally to computer-based systems which provide access to multimedia databases, and more particularly to systems that visualize multimedia objects according to media characteristics.
Traditional browsing and navigating in a large multimedia database, for example, image, video, or audio databases, is often disorienting unless a user can form a mental picture of the entire database. Content-based visualization can provide an efficient approach for browsing and navigating multimedia databases.
Many browsing and retrieval systems are feature based. For example, color, texture and structure for images, color and motion for videos, ceptrum, pitch, zero crossing rate, and temporal trajectories for audio. Color is one of the most widely used features for content-based image/video analysis. It is relatively robust to background complication and independent of image size and orientation. Color histograms are the most commonly used color feature representation. While histograms are useful because they are relatively insensitive to position and orientation changes, they do not capture spatial relationship of color regions, and thus, color histograms have limited discriminating power.
One can also use color moments. There, the color distribution of an image is interpreted as a probability distribution, and the color distribution can be uniquely characterized by its moments. Characterizing a 1-D color distribution with the first three moments of color is more robust and more efficient than working with color histograms.
Texture refers to the visual pattern with properties of homogeneity that do not result from the presence of a single color or intensity. Texture contains important information about the arrangement of surfaces and the relationship of the surfaces to the surrounding environment. Texture can be represented by wavelets by processing an image into a wavelet filter bank to decompose the image into wavelet levels having a number of bands. Each band captures the feature of some scale and orientation of the original image. For each band, the standard deviation of wavelet coefficients can be extracted.
Structure is a more general feature than texture and shape. Structure captures information such as rough object size, structural complexity, loops in edges, etc. Structure does not require an uniform texture region, nor a closed shape contour. Edge-based structure features can be extracted by a so-called xe2x80x9cwater-filling algorithm,xe2x80x9d see X. Zhou, Y. Rui and T. S. Huang, xe2x80x9cWater-filling algorithm: A novel way for image feature extraction based on edge maps,xe2x80x9d in Proc. IEEE Intl. Conf. On Image Proc., Japan, 1999, and X. S. Zhou and T. S. Huang, xe2x80x9cEdge-based structural feature for content-based image retrieval,xe2x80x9d Pattern Recognition Letters, Vol 22/5, April 2001. pp. 457-468.
The invention visualizes multimedia objects, such as multiple images, on an output devices based on media features such as color, texture, structure, audio ceptrum, textual semantics, or any combination thereof. The vizualization can use the actual objects, or visual icons representing the objects. The resulting arrangement of multimedia objects automatically clusters objects having similar features. An original high-dimensional feature space is reduced to display space, i.e., locations having coordinates x and y, by principle component analysis (PCA).
Furthermore, the invention provides a process that optimizes the display by maximizing visibility, while minimizing deviation from the original locations of the objects. Given the original PCA-based visualization, the constrained non-linear optimization process adjust the location and size of the multimedia objects in order to minimize overlap while maintaining fidelity to the original locations of the objects which are indicative of mutual similarities. Furthermore, the appearance of specific objects in the display can be enhanced using a relevancy score.
More particularly, the invention provides a method for visualizing image objects. The method assigns a feature vector to each image. The feature vector of each image is reduced to a location vector having a dimensionality of a display device. A cost function is evaluated to determine an optimal location vector for each image, and each image is displayed on a display device according to the optimal location vector. The reducing can use principle component analysis.