Semantic tagging and indexing is a popular way of organizing information, especially on the Internet. For example, tags are used extensively for blog postings, product catalogs (e.g., of book sellers), and photo collections. Audio recordings are also becoming more popular as an information medium, with Internet momentum gaining around podcasting, audio books, and video. The taxonomy used for tagging this content is not pre-defined and is evolving in an ad-hoc fashion, following popular trends, for example. The popular taxonomy can be referred to as “folksonomy”.
There are practical problems with tagging this type of content. Knowledge of the current state of the folksonomy relies heavily on intuition. It is difficult to know exactly what types of tags are appropriate for a piece of data without guessing and searching to validate the guess. If an appropriate tag for user content is intuitively obvious to other people, but not to the user, then the user may not use it and other people will have difficulty finding the user content, if finding the content at all.
Common tools for recording audio and/or video content (e.g., telephones and cameras) are not good text input devices, and do not lend to easily attaching textual tags to content. Hence, a significant amount of audio and/or video content may go untagged if posted from these devices.
Audio and video content is oftentimes large in file size and should be reviewed serially at or near actual speed (or a small multiple thereof, such as double or triple speed) by a human in order to tag appropriately. This can lead to content not being tagged or to only portions of the content being reviewed, and hence, the tags not representative of the content as a whole.