Audio and/or visual data come from a variety of sources such as television broadcast, satellite broadcast, Internet-based broadcast, and radio broadcast and is steadily growing recently with the advancement of technology, especially in the field of consumer electronic products and broadband Internet access. In many cases, for example in broadcasting companies, multimedia libraries are so vast that an efficient indexing mechanism that allows for retrieval of specific footage is necessary. This indexing mechanism can be even more important when attempting to rapidly retrieve specific multimedia footage such as, for example, sports highlights or breaking news. Similarly, for Internet-based filing or even for a personal media collection, the ability to index a media file (e.g., funny or dramatic scenes of a video) is also needed.
Currently, a common method for generating an index of a media file includes manually entering indices, or tags, as the media file is being played. These tags are typically entered via an input device, such as a keyboard, and are often associated with the media's timeline. While effective, this post-processing of the multimedia footage can be extremely time-consuming and expensive.