As multimedia databases of image files, video files, audio files, etc. on mobile devices have become progressively larger in recent years, the need for a comprehensive and accurate system for database categorizing, searching and management has greatly increased. In earlier mobile devices, memory space was greatly limited, resulting in a relatively low number of multimedia objects being stored on the device. With only a few objects being stored, accurate categorizing, searching and management was not of substantial importance. However, as memory capabilities have increased, mobile device users have been provided with the ability to store hundreds and even thousands of objects on a single device such as a mobile telephone. With so many stored objects, however, users can have an exceptionally difficult time finding a previously stored object or organizing all of his or her multimedia files for later access.
In the image retrieval field, existing content-based Image Retrieval (CBIR) systems search relevant images by searching for similar low-level features extracted from target images. One problem with this approach is that “similar low-level features” does not necessarily ensure “similar semantic contents.” This is due to several factors. First, two “similar semantic contents” may ultimately have different appearances. For example, there can be intra-class object variation (e.g. mountains do not necessarily look similar). Second, “similar low-level features” may correspond to conceptually dissimilar objects. For example, a color histogram cannot easily distinguish a red roses from a sunset. Third, images always contain background clutter which often interferes low-level feature matching. While humans can easily identify prominent features from semantic similar contents (e.g., a face has dark elliptical regions representing eyes), it is still extremely difficult for computational algorithms to automatically separate prominent features from low-level features.
Although there have been a number of attempts to address the above issues through content-based image retrieval, each has its own drawbacks. For example, U.S. Pat. No. 5,893,095, issued to Jain et al., discloses a content-based image retrieval system based upon matching low-level features extracted from target images. Such “primitive” features include hue, a saturation and intensity histogram, edge density etc. However and as mentioned above, these low-level features do not always correspond to image semantics.
A general-purpose image identification/retrieval system was also previously developed for identifying an image according to four kinds of low-level features: average color, color histogram, texture and shape. Under this systems, users were capable of manually adding user-defined shapes and/or regions of interests within images to refine the search results. These user-specified features could often be more meaningful and could generate accurate results. However, entering these features is tedious and difficult for most users.
In addition to the above, there have been a number of attempts to enable machine learning for the purpose of feature selection. For example, one system involves the training of a face detector using the AdaBoost (short for “Adaptive Boosting” learning algorithm. Given a set of training face images, prominent facial features, such as high contrast regions around the foreheads and eyes, are automatically selected. Although this method demonstrates the feasibility of supervised learning for feature selection, it cannot be directly applied to image database retrieval due to the presence of background clutter.
In another approach, local prominent features are selected and then represented in a combined probabilistic model. This model effectively accommodates intra-class object variations. However, this method is computationally extensive and the number of selected features is therefore limited (to only six features in one implementation). This method cannot be directly applied to mobile applications.