Multimedia content is becoming more common both on the World Wide Web and local computers. As the corpus of multimedia content increases, the indexing of features within the content becomes more and more important. Observing both audio and video simultaneously and annotating that observation results in a higher confidence level.
Existing multimedia tools provide capabilities to annotate either audio or video separately, but not as a whole. (An example of a video-only annotation tool is the IBM MPEG7 Annotation Tool, inventors J. Smith et al., available through www.alphaworks.ibm.com/tech/videoannex. Other conventional arrangements are described in: Park et al, “iMEDIA-CAT: Intelligent Media Content Annotation Tool”, Proc. International Conference on Inductive Modeling (ICIM) 2001, South Korea, November, 2001; and Minka et al., “Interactive Learning using a Society of Models,” Pattern Recognition, Vol. 30, pp. 565, 1997, TR #349.
It has long been recognized that annotating video or audio features in isolation results in a less confidence of the identification of the features.
In view of the foregoing, a need has been recognized in connection with providing improved systems and methods for observing and annotating multi-modal events, objects, scenes, and audio occurring in multimedia files.