Retrieval of multimedia content collections remains an important and challenging problem. Millions of photographs are added to the web every day and users increasingly require mechanisms for managing and navigating these massive collections. The challenge in meeting this need lies, in part, in the fact that the actual information contained in multimedia content, such as images, and videos (e.g., matrices of pixels and/or streams of audio) does little to reveal the actual semantic meaning of the media.
Various approaches to multimedia information retrieval relied almost exclusively upon content and contextual cues that could be extracted from the media itself and its associated metadata. The content cues, however, typically are limited to distributions of low-level features, such as color, texture, and/or edges in the images, while the contextual cues range from snippets of associated text terms to timestamps or geo-tags. However, many of the contextual cues may be provided based on an individual's perspective, often providing minimum value to other individuals. That is, many of the contextual cues, such as keywords, tags, or the like, are viewed as being too noisy, or personalized, to often be relevant to many searches over multimedia content collections. Because of this semantic gap, there remains a need for improved mechanisms for managing such multimedia content collections. Thus, it is with respect to these considerations and others that the present invention has been made.