The exemplary embodiment relates to the image processing. It finds particular application in connection with an apparatus and method which employ location relevance information in the generation of a representation of an image which may find use in image categorization, content-based image retrieval, and image clustering applications. It is to be appreciated that the representations generated are not limited to such applications.
In data processing, some useful operations include automatic or semi-automatic image categorization, automated or semi-automated retrieval of similar images, and clustering of images. For example, given an unorganized database of scanned images of documents, it may be useful to sort or categorize the images into classes such as the type of document.
Since images, such as photographs and scanned documents, contain a large amount of information, such as colorant values for each pixel in the image, it is desirable to generate a representation of the image which involves a reduced set of parameters, yet which provides sufficient information to allow categorization of the image into one or more of a set of classes and/or the assessment of similarities/differences among images. There has thus been a significant research effort into the design of visual words for image categorization, retrieval and clustering. Most image categorization methods follow a sequence which includes (i) feature detection, (ii) low-level feature extraction, (iii) global image or image region representation, and (iv) classification.
Visual saliency models, for example, have been used for feature detection and to estimate regions of interest. Many of these methods are based on biological vision models, which aim to estimate which parts of images attract visual attention. Implementation of these methods in computer systems generally fall into one of two main categories: those that give a number of relevant punctual positions, known as interest (or key-point) detectors, and those that give a more continuous map of relevance, such as saliency maps. Interest point detectors have proven useful for obtaining reliable correspondences or matching. They have also been extensively evaluated as a means to sparsely select positions to extract image features for categorization or object recognition. However, it has been shown that strategies based on dense sampling (or even random sampling) provide better performance (both in accuracy and speed) than interest points. One of the reasons is that interest point detectors may disregard background regions of the image that carry important category information, such as the sky for airplane images. Another reason is their reduced robustness with respect to variations introduced, for instance, by shades. However, interest-point detectors can complement dense feature extraction. For example, several methods are performed and the results are fused in a multiple kernel learning framework. A disadvantage of this approach is that additional sampling strategies increase the computational cost and the complexity of the classification system.
Saliency maps can provide richer information about the relevance of features throughout an image. While interest points are generally simplistic corner (Harris) or blob (Lapace) detectors, saliency maps can carry higher level information. Such methods have been designed to model visual attention and have been evaluated by their congruence with fixation data obtained from experiments with eye gaze trackers.
Recently, saliency maps have been used for object recognition or image categorization. For example, saliency maps have been used to control the sampling density for feature extraction. Alternatively, saliency maps can be used as foreground detection methods to provide regions of interest (ROI) for classification. It has been shown that extracting image features only around ROIs or on segmented foreground gives better results than sampling features uniformly through the image. The disadvantage is that such methods rely heavily on foreground detection and they may miss important context information from the background.
The exemplary embodiment provides a system and method for modeling an image incorporating location relevance information which provides an improvement over existing image processing techniques.