The common practice known as “logging” a media source (e.g. a video tape), is a human intensive, mostly manual process. The “logger” views the media source from time A to time B, writing down words that describe the content to create a log. The log is a text document of data or metadata describing the audio, video and image contents in the media source, at specific times between points A and B. A typical use of the log is a video editor looking for specific contents based on the descriptions (data and metadata) in the log, then extracting the desired contents from the media source to make a new video. Another use may for a broadcaster to locate places in the media source to insert appropriate advertising.
Currently, a number of applications are available to detect particular types of content within a media stream. The following are just a few of the currently available applications: face detection, dynamic image peak detection, color value detection, dynamic image change detection face recognition, music beats detection, audio fingerprint detection, dynamic peaks detection, speech detection, word and phrase detection.