Multimedia files, such as videos, can be associated with metadata describing information from the multimedia file or video itself, as well as relevant information from the multimedia or video. For example, the adaptive bitrate streaming technique Dynamic Adaptive Streaming over HTTP (DASH), a Moving Picture Experts Group (MPEG) video standard, uses Media Presentation Description (MPD), in which metadata is described by separating a video's segment information into layers, forming a suitable data model for media transmission. Other MPEG video standards include MPEG-4 Part 20, also known as MPEG-4 LASeR, and MPEG-7, which allow metadata capable of annotating location, size, appearance time, relevant information URLs, etc. of objects describing additional relevant video content information of in an XML format with video and audio multimedia.
Interactive videos are one example of utilizing metadata to describe relevant information in a video. In such interactive videos, objects in the video are augmented based on existing metadata with interactive capabilities, permitting an audience, for example, to click on such objects, revealing relevant additional information on the video, which can then instantly be learned by the audience, as well as shared by them. Currently, in order to create such interactive videos, a person must select a video of interest and separately obtain relevant information about a content of the video, and then manually create the video's metadata by annotating particular portions of the video with various information.
However, current techniques for creating metadata for interactive videos requires considerable time and cost. Additionally, relevant information about video contents, which will be added to the video as metadata, can be difficult to generate solely from a video in a video format. For example, even if characters or objects relevant to a story told in a video have been recognized, it remains difficult to designate a title for such characters or object or to infer a relationship between characters, objects, or events.