The term “content” can be used to describe digital media such as audio, video, an image, a collection of images or a combination thereof. A digital content file can include several streams of audio, video and text in different tracks. Different tracks may contain different video programs, alternative audio tracks for different languages, and/or text for subtitles. Content is typically encoded or compressed for efficient transmission and storage. Common encoding formats include MPEG-2, VC-1 and H.264 for video and mp3 (i.e. MPEG-1 Audio Layer 3), AAC and Ogg Vorbis for audio.
“Sections” are parts of a content file. Typically, they contain temporal sections (i.e. a subsection of a piece of content in time such as a number of video frames, group of pictures or audio samples). In many instances, sections contain several seconds of encoded content and the sections in a content file are of similar duration. Other variations include sections that contain the entire media file. Sections can also be created from a partition of a file including but not limited to in areas of an image or video frame, pages of a text document, channels of video in a multi channel video stream, channels of a multi channel audio stream (such as stereo, surround sound, multi language versions), channels of a stereo or 3D video, and/or groups of bytes in a bitstream.
Sections may be created by actual separation of a file into multiple files or they may be created within a file. Creation of sections within a file can be realized by creating pointers to sections or an index within a container file that allow for fast random access to individual sections. The creation of sections within a file can be further enhanced by grouping content within separate structures within the file. Sections can also be created on the fly where the section is determined just before it is requested or used.
Chunks are sections that are prepared to be retrieved by a client via a “playlist”. Typically, several alternative chunks are created from every section. These chunks contain the same perceptual content (e.g. the same 2 seconds of content in a video file). Commonly those chunks differ in their encoding and/or bitrate (e.g. in adaptive bitrate streaming systems). Different bitrates are typically created by compressing using different levels of lossy compression, and different video resolutions. Other variations to create chunks from sections include: variation in compression codecs or number of channels, video 2D/3D video, bit depth of audio samples, bit depth of video samples, additional channels for audio such as stereo surround sound channels, compression codecs that differ in decoding complexity in order to support adaption to processing performance of the playback client and DRM systems and different content that can be chosen by or targeted to an individual as is the case in targeted advertisement.
A “client device” typically is an electronic device that implements a media player. It typically retrieves content from a server via a network but may also play back content from a local storage device or physical media such as a DVD, a Blu-ray disc, other optical discs, a USB memory stick, or another storage device. Client devices can include Set Top Boxes, desktop and laptop computers, cell phones, tablet devices, game consoles, mp3 players, portable media players and other media players. The client device is interpreting the playlist and as such may reside on the head-end where the playlist is executed to assemble content according to its destination.
The term “streaming” describes the playback of content on a client device, where the content is stored on a server and continuously sent to the client device over a network during playback. Typically, the client device stores a sufficient quantity of content in a buffer at any given time during playback to prevent disruption of playback due to the playback device completing playback of all the buffered content prior to receipt of the next portion of content. The client may also store the streamed content locally for later playback. Adaptive bit rate streaming or “adaptive streaming” involves detecting the present streaming conditions (e.g. the user's network bandwidth and CPU capacity) in real time and adjusting the quality of the streamed media accordingly. Typically, the source media is encoded at multiple bit rates and the client device switches between streaming the different encodings depending on available resources. Alternatively this choice may be made by the server as it evaluates the connection quality of an individual connection.
Adaptive streaming may use common protocols like the Hypertext Transfer Protocol (HTTP), published by the Internet Engineering Task Force and the World Wide Web Consortium as RFC 2616, or Real Time Streaming Protocol (RTSP), published by the Internet Engineering Task Force as RFC 2326, to stream media between a server and a playback device. HTTP is a stateless protocol that enables a playback device to request a file or a byte range within a file. HTTP is described as stateless, because the server is not required to record information concerning the state of the playback device requesting information or the byte ranges requested by the playback device in order to respond to requests received from the playback device. RTSP is a network control protocol used to control streaming media servers. Playback devices issue control commands, such as “play” and “pause”, to the server streaming the media to control the playback of media files. When RTSP is utilized, the media server records the state of each client device and determines the media to stream based upon the instructions received from the client device.
In adaptive streaming systems, the source media is often organized on a media server with the help of a top level index file pointing to a number of alternate streams that contain the actual video and audio data. Different adaptive streaming solutions can utilize different index and media containers. The Synchronized Multimedia Integration Language (SMIL) developed by the World Wide Web Consortium is utilized to create indexes in several adaptive streaming solutions including IIS Smooth Streaming developed by Microsoft Corporation of Redmond, Wash., and Flash Dynamic Streaming developed by Adobe Systems Incorporated of San Jose, Calif. HTTP Adaptive Bitrate Streaming developed by Apple Computer Incorporated of Cupertino, Calif. organizes media files via an extended M3U playlist file (.M3U8), which is a text file containing a list of URIs that typically identify media chunks. Today's most commonly used media container formats are the MP4 container format specified in MPEG-4 Part 14 (i.e. ISO/IEC 14496-14) and the MPEG transport stream (TS) container specified in MPEG-2 Part 1 (i.e. ISO/IEC Standard 13818-1). The MP4 container format is utilized in IIS Smooth Streaming and Flash Dynamic Streaming. The TS container is used in HTTP Adaptive Bitrate Streaming.
The term “playlist” can be used to describe a list of chunks or links to chunks for playback. A playlist may also contain a selection that allows adapting to user preferences such as playback order, language settings or bandwidth consumption or device capabilities such as 3D display, surround sound and DRM support. The M3U playlist file utilized in HTTP Adaptive Bitrate Streaming is an example of a file containing a playlist (i.e. a list of URLs pointing to TS container files that contain alternative chunks for the sections of a piece of content). A playlist can also be distributed across multiple files. For example, in IIS Smooth Streaming a list of URLs is provided as a SMIL file and each container referenced in the SMIL file contains an index. Together the SMIL top level index file and the indexes within in each of the container files can provide a complete index to all of the alternative chunks for the different sections of a piece of content. A playlist is typically used by a media player running on a client device but can also be used to assemble content in preparation for further distribution. Common formats for playlists include: M3U, RAM, Winamp B4S, Advanced Stream Redirector (ASX) with variation of ASX and WVX, WPL, PLS, Kapsule and KPL, SMIL, iTunes Library, DAAP, Creative Commons RDF and XML Shareable Playlist Format (XSPF).
A playlist is typically interpreted on the client but may also be used on the server to assemble content before final delivery to a client (e.g. Flash Dynamic Streaming). Examples where a playlist is interpreted on the server also include individual content composition for each client, or server side adaption to the available bitrate and client capabilities.
A “link” is an entry in a playlist pointing to a content file or content chunk. A link can be expressed as a URL or filename. It may point to a local or remote file and/or a location within a local or remote file. The file may be an existing file or it may be created and prepared on the fly while it is requested via the client.
A process that can be used to prepare content for use in a typical adaptive streaming system is illustrated in FIG. 1. The content is divided into temporal sections, depicted as T1, T2 and T3 (100). Each section is compressed with three different bitrates Low (131), Medium (132) and High (133). A first chunk 120 is a chunk for the low bitrate of section T3. The Low bitrate has the smallest data size and lowest quality. Consecutive chunks of different bitrates can be played consecutively without duplicated or skipped content.
For adaptive streaming, links (160) to all three chunks of each temporal section can be provided in a playlist (150) to the client device. The client device adapts the bitrate during playback according to its playback environment by selecting the best chunk for each section. The best chunk typically is the chunk with the highest bitrate that can be played without interruption. In the example shown in FIG. 1, the content (170) streamed to the client device is composed of High, Med, High bitrate for corresponding Sections T1, T2, T3. The client device may have chosen to reduce the bitrate for section T2, because it did not have sufficient bandwidth to download the chunk with the High bitrate fast enough to play without interruption.
The term “digital watermarking” can be used to describe processes that embeds imperceptible, robust and secure information in content. One application is embedding recipient information in the content in order to identify individuals that receive the content and distribute the content in an unauthorized manner.
There are different approaches to applying digital watermarks to content files. One application is to embed the mark on the server before the content is delivered to a client device. One possibility is to mark the content “on the fly” as the file is requested from the client. For example, by using an approach as described in U.S. patent Ser. No. 13/002,280, entitled “Efficient Watermarking Approaches of Compressed Media”, filed Dec. 30, 2010, the disclosure of which is incorporated by reference herein. Another approach is to prepare the content with different sections that are perceptually identical but contain different information and to assemble the sections during delivery in a way that represents the information to be embedded as described in U.S. Pat. No. 7,555,650, entitled “Techniques for reducing the computational cost of embedding information in digital representations”, filed Mar. 17, 2003. Several watermark systems that embed information in content files and that can be applied to sections and chunks have been described in the prior art. Such systems apply noise patterns, DCT transformation or luminance variations in the content in order to embed information. An overview can be found in Cox et al., “Digital Watermarking and Steganography” (2nd Ed., 2007). The information that is embedded using a digital watermark is called the payload. It often represents a number that relates to a user, device or content owner. Transformations are typically applied to the payload for security and reliability such as encryption, error correction and/or error detection codes. Payloads are often stored repeatedly within the content in order to enhance the robustness with an increase in redundancy.
If content is delivered using a playlist and adaptive streaming, it is not known prior to streaming what parts of the content will be accessed. If the content is prepared for each client in all available bitrates and maintained for the entire duration of possible access by the client, the resulting overhead in processing storage created is significant if not prohibitive.