Recently, Internet-based streaming of audio and video content has dramatically increased in popularity. This has resulted in a corresponding surge in merging Internet-based content and broadcast content. This concept—sometimes referred to as a “second screen,” “companion,” or “enhanced” experience—allows a television audience to interact with the content they are consuming, such as TV shows, movies, music, or video games. In a typical second screen experience, additional, supplemental data is displayed on a secondary, internet-connected (and often portable) display device concurrent to the display of broadcast content (e.g., in a television, for example). Other enhanced programming can take place on the television screen itself with elements from the internet being displayed on top of or alongside the original broadcast content. This supplemental data is often synchronized with the broadcast content (referred to as media assets) being viewed, and is typically designed to heighten the user's viewing experience by increasing the level of user interaction with the displayed content. Examples of this supplemental data may include additional scenes or other media content, relevant information, interaction with other users and social media tools, additional displays of advertisement, or even present the user with an interface that the user may use to directly or collectively affect (and/or change) the actual content displayed.
Generally in the case of second screen and other enhanced experiences, the companion device or internet connected screen executes an application that recognizes (e.g., via user input or automatic content recognition) the current or imminent display of a media asset. The application then queries an appropriate data server (via the Internet, for example) and receives supplemental data corresponding to the media asset, if available. The display of the supplemental data is synchronized with the display of the media asset such that delivery of the broadcast content to the television (or other primary display device) and scene or time specific supplemental content to the companion display device or internet connected screen is performed simultaneously according to a pre-programmed track, or pushed to the application in real-time in the case of live events such as sporting events, performances, press events, or award shows.
However, generation and synchronization of this supplemental content can be difficult and/or inefficient, and currently, no unified standard exists for the generation and synchronization of supplemental content for produced media assets. Conventional techniques for generating synchronized supplemental content often include manually integrating the supplemental content with the produced video/audio tracks of the media asset, often without duplication and on an ad hoc basis. The resulting final output is generally a single, immutable sequence of broadcast content interspersed with supplemental content intended to be delivered as a series of pre-scheduled events. As such, changes to the audio/video tracks may be severely limited, and rapid or late modifications may be effectively precluded.
Moreover, this integration is typically performed only after the post-production of the media asset, and often by a separate editor or production team from the production of the media asset. As a result, supplemental content generated during the course of production of the broadcast or media asset may be lost or unavailable, or may require additional effort to integrate seamlessly within the produced media asset. In addition, creative decisions for the interactive content cannot be acted on while the actual edit process is occurring. With such a workflow, the interactive content must be authored either after the entire production and post-production process is completed (which can leave little time before the asset is scheduled to air) or risk manual retiming of all elements should any of the audio/video be changed in the editing process.
Finally, authoring of dynamic metadata may present another issue during the creative process. For example, conventional mixed format or mixed standard media assets may be extremely difficult and/or inefficient to produce as a single, contiguous asset. Conventionally, the edit to tape process for mixed format assets (e.g., television programs that contain both Standard and High definition content) requires a separate edit for each Active Format Description (“AFD”) flag change from Standard definition (SD) to High definition (HD) programming. For some anthologies, this may represent an extremely user and time intensive amount to edit since flags can't be dynamically inserted into the file and passed through the transcode stage in the final part of the broadcast chain. Currently, there exists no means or standard in conventional editing software applications to author AFD dynamically. Moreover, the transcoder software also has no way of authoring dynamic AFD and applies a single AFD flag to the entire program. Closed captioning is another example where there is metadata (text) that has a relationship to the video and audio playing being edited and subsequently played back.