1. Field of the Invention
The invention disclosed and claimed herein generally pertains to a method and apparatus for efficient annotation for multimedia content. More particularly, the invention pertains to a method and apparatus for speeding up the multimedia content annotation process by combining the common tagging and browsing interfaces into a hybrid interface.
2. Description of the Related Art
Recent increases in the adoption of devices for capturing digital media and the availability of mass storage systems has led to an explosive amount of multimedia data stored in personal collections or shared online. To effectively manage, access and retrieve multimedia data such as image and video, a widely adopted solution is to associate the image content with semantically meaningful labels. This process is also known as “image annotation.” In general, there are two types of image annotation approaches available: automatic and manual.
Automatic image annotation, which aims to automatically detect the visual keywords from image content, has attracted a lot of attention from researchers in the last decade. For instance, Barnard et al. Matching words and pictures. Journal of Machine Learning Research, 3, 2002, treated image annotation as a machine translation problem. J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pages 119-126, 2003, proposed an annotation model called cross-media relevance model (CMRM) which directly computed the probability of annotations given an image. The ALIPR system (J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In Proceedings of ACM Intl. Conf. on Multimedia, pages 911-920, 2006) uses advanced statistical learning techniques to provide fully automatic and real-time annotation for digital pictures. L. S. Kennedy, S.-F. Chang, and I. V. Kozintsev. To search or to label? predicting the performance of search-based automatic image classifiers. In Proceedings of the 8th ACM international workshop on Multimedia information retrieval, pages 249-258, New York, N.Y., USA, 2006. have considered using image search results to improve the annotation quality. These automatic annotation approaches have achieved notable success recently. In particular, they are shown to be most effective when the keywords have frequent occurrence and strong visual similarity. However, it remains a challenge for them to accurately annotate other more specific and less visually similar keywords. For example, an observation in the P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. TrecVid 2006 overview. In NIST TRECVID-2006, 2006 notes that the best automatic annotation systems can only produce a mean average precision of seventeen percent on thirty nine semantic concepts for news video.
With regard to manual annotation, there has been a proliferation of such image annotation systems for managing online or personal multimedia content. Examples include PhotoStuff C. Halaschek-Wiener, J. Golbeck, A. Schain, M. Grove, B. Parsia, and J. Hendler. Photostuff - an image annotation tool for the semantic web. In Proc. of 4th international semantic web conference, 2005.for personal archives, Flickr. This rise of manual annotation partially stems from an associated high annotation quality for self-organization/retrieval purpose, and also an associated social bookmarking functionality that allows public search and self-promotion in online communities.
Manual image annotation approaches can be further categorized into two types. The most common approach is tagging, which allows the users to annotate images with a chosen set of keywords (“tags”) from a controlled or uncontrolled vocabulary. Another approach is browsing, which requires users to sequentially browse a group of images and judge their relevance to a pre-defined keyword. Both approaches have strengths and weaknesses, and in many ways they are complementary to each other. But their successes in various scenarios have demonstrated that it is possible to annotate a massive number of images by leveraging human power. Unfortunately, manual image annotation can be a tedious and labor-intensive process.
What are needed are efficient systems for performing annotation of multimedia content.