The present invention relates to tagging an object in a video with metadata.
Currently, video consumption, navigation, and interaction are fairly limited. Users can watch a video with a different audio dubbing or subtitles, fast forward or rewind the video, or scrub the seeker to skip to different segments in the video. This limited interaction is due to the lack of contextual metadata available for the video. While subtitles and audio transcription data can provide some level of context about the video, many of the interesting aspects in a video are not spoken, such as visual objects within the moving frames. Identifying and tracking these objects is laborious and requires a frame-by-frame analysis.