1. Field
Embodiments of the present invention generally relate to systems and methods for discovering and consuming video content, including annotating, organizing, searching, indexing and sharing information about such content. In particular, embodiments of the present invention relate to generation of a table of contents and/or an index for video content based on user-generated and/or automatically extracted labels.
2. Description of the Related Art
In recent years, digital distribution of videos and their viewing on devices connected to a network has become common. Videos include recordings, reproduction or broadcasting of moving visual images that may contain sound and/or music. Videos may contain a variety of content that may be for entertainment, informational or educational purposes. There are various limitations associated with the current video discovery and consumption experience provided by existing video players that frustrate viewers. For example, existing online videos are not accompanied by a table of contents (ToC) or an index that allows a viewer to directly jump to a segment of interest within the video. Therefore, the conventional approach of viewing online videos is that the users either have to watch the video from the start to the end or by randomly moving forward in the video by using video controls (e.g., fast forward, rewind, etc.) of the video player. The video player is typically software that can play multiple video file formats, such as Audio Video Interleave (AVI), Moving Picture Experts Group (MPEG)-4 (MP4), Apple QuickTime Movies (e.g., a MOV file) and so forth. The video players can be implemented in various programming languages, including, but not limited to, Adobe Flash, C++, Java, JavaScript, Python or a version of HyperText Markup Language (HTML), such as HTML5.
Furthermore, at present, when watching an online video, users have no way to communicate and/or keep track of facts, opinions, and/or emotions about specific content and/or moments within the video. Tools, such as YouTube, allow users to consume video content but only to share information, in the form of comments, about the video as a whole.
While the concept of bookmarking exists with books, there is no useful tool that allows users to “bookmark” moments in videos and share these moments with others. In addition, there is no existing tool that allows users to skim through the video by skipping the boring parts. It is the lack of the abilities to search, scan, and skim through the videos that cause users to stop watching an entire video.
One way the users get around this today is by randomly seeking ahead to the content they want by using the video controls. This may end up being an unsatisfying experience based on trial and error and meanwhile there is no guarantee that the users will find what they are looking for. Other ways include leveraging annotations manually added by the user who uploaded the video and searching through the closed captions or subtitles on content websites where these features are available e.g., YouTube. Both of these methods are unreliable, as the vast majority of the users uploading videos don't add annotations to help their viewers and very few videos have closed captions available.
The automatic transcription services offered by sites like YouTube also suffer from poor audio to text quality unless the input audio is free of noise and there is a single person talking. Lastly, all the methods based on keyword based search suffer from a broader problem that they can't differentiate between a mere mention of a keyword in the video versus an authoritative section that talks about the high level concept described by the keyword in detail. It is the result of the shortcoming of these available methods for search today that the users end up recording the exact times within the video where interesting moments occur and may share those with others through discussion forums, email messages, or social networks.
The current lack of structural information within the videos also adversely affects the ability to do a meaningful search across the videos. When users search for videos, they are currently provided with results that are based on the indexing of video titles, subtitles or keywords etc. For example, if a user searches for “binary trees” they might be provided with video results that have one or both of the words “binary” and “trees” in the title of the video. A longer video that has some valuable information on binary trees but mostly covers other topics might not be discoverable by the user because the title and metadata may not contain the words binary tree and therefore will not match the user query. It would be desirable to enable users to locate and label useful video segments from within the larger video, and make these labels search-able for other users.
With the increasing use of social networking and photo-sharing sites, users are becoming more familiar with the concept of liking or disliking content and also the tagging of content, e.g., providing the name of a person in a photograph on a photo sharing site, etc. Many existing social media applications do have like/dislike labels for the whole video, however, they do not give the user the option to view a ToC of a video or label a segment of the video. It would be desirable to enhance users' video discovery and consumption experience by, integrating labels from multiple users (and optionally also from automatic video indexing methods) to create a consistent and interactive ToC for videos, thereby allowing viewers to easily identify and jump to segments of interest within videos.