The growth of digital photography and video over the last decade or so has created many new and interesting challenges relating to how people organize, store, and retrieve their multimedia repositories. Unlike for textual data, automatic methods for describing, indexing and retrieving visual media, such as image and video content, are limited to date. Existing multimedia search engines typically rely on manually generated text-based annotations, supported at most by EXIF data, such as the time the capture was taken, the camera model used, etc.
Photo blogging sites such as Flickr further explore location tagging through supportive map tools, but once again the user-generated content needs to be positioned manually on a global map by the authors, which, in practice, is a tedious task for users. Some professional cameras integrate GPS receivers to provide automated geo-tagging of captured images. Similarly, a network-connected capturing device might connect to external information sources, such as a GPS phone, for geo-tagging assistance.
Still further, existing approaches to suggesting annotation tags for given digital image data include approaches based on group or collaborative data, such as common spatial, temporal and social contexts. Group information can be used to infer descriptors for given media content. Other approaches include content tagging based on speech recognition, wherein input speech is recognized and decoded according to a selected speech recognition lexicon. In some sense, such tagging depends on the close temporal relationship between receipt of the user speech and capture of the media.
More broadly, existing approaches to media tagging commonly rely on low-level image or audio features as input to annotate or predict a tag for a photograph. Further, known approaches commonly rely on network-provided metadata from user communities (e.g., aggregated databases of tagging information), which is undesirable in terms of user privacy and in terms of potential data access/transfer latency.