A media content provider or distributor may deliver various media contents to subscribers or users using different encryption and/or coding schemes suited for different devices, e.g., televisions, notebook computers, and mobile handsets. The media content provider may support a plurality of media encoder and/or decoders (codecs), media players, video frame rates, spatial resolutions, bit-rates, video formats, or combinations thereof. A piece of media content may be converted from a source or original representation to various other representations to suit the different user devices.
A piece of media content may comprise a media presentation description (MPD) and a plurality of segments. An MPD may comprise elements and attributes programmed to describe information regarding the media content. In Extensible Markup Language (XML) programming, an element may comprise three parts, including a start tag indicated by <element name>, an element content, and an end tag indicated by </element name>. Further, an element may contain one or more attributes and/or child elements. An attribute may comprise an attribute name and an attribute value. The MPD may be an XML file or document describing the media content, such as its various representations (defined below), uniform resource locators (URLs) addresses, and other characteristics. For example, the media content may comprise several media components (e.g. audio, video, and text), each of which may have different characteristics that are specified in the MPD. Each media component comprises a plurality of segments containing the parts of actual media content, and the segments may be stored collectively in a single file or individually in multiple files. Each segment may contain a pre-defined byte size (e.g., 1,000 bytes) or an interval of playback time (e.g., 2 or 5 seconds) of the media content. A segment may comprise the minimal individually addressable unit of data; the entity that can be downloaded using URLs advertised via the MPD.
Depending on the application, the media content may be divided into various hierarchies. For example, the media content may comprise multiple periods where a period is a time interval relatively longer than a segment. For instance, a television program may be divided into several 5-minute-long program periods, which are separated by several 2-minute-long commercial periods. Further, a period may comprise one or multiple adaptation sets (ASs). An AS may provide information about one or multiple media components and its/their various encoded representations. A representation may be defined as a single encoded version of the complete asset, or of a subset of its components, e.g., International Organization for Standardization (ISO) base media file format (ISO-BMFF) containing unmultiplexed 2.5 megabit per second (Mbps) 720 pixel (p) Advanced Video Coding (AVC) video, and separate ISO-BMFF representations for 96 kilobit per second (Kbps) Moving Picture Experts Group-4 (MPEG-4) Advanced Audio Coding (AAC) audio in different languages. For instance, an AS may contain different bit-rates of a video component of the media content, while another AS may contain different bit-rates of an audio component of the same media content. A representation may be an encoded alternative of a media component, varying from other representations by bit-rate, resolution, number of channels, or other characteristics, or combinations thereof. Each representation comprises multiple segments, which are media content chunks in a temporal sequence. Moreover, sub-segments may be used to enable downloading a segment in multiple parts, each sub-segment having a specific duration and/or byte size. One skilled in the art will understand the various hierarchies that can be used to deliver a media content.
In adaptive streaming, when delivering media content to a user device, the user device may select appropriate segments dynamically based on a variety of factors, such as network conditions, device capability, and user choice. Adaptive streaming may include various technologies or standards implemented or being developed, such as Dynamic Adaptive Streaming over Hypertext Transfer Protocol (HTTP) (DASH), HTTP Live Streaming (HLS), or Internet Information Services (IIS) Smooth Streaming. For example, the user device may select a segment with the highest quality (e.g., resolution or bit-rate) possible that can be downloaded in time for playback without causing stalling or rebuffering events in the playback. Thus, the user device may seamlessly adapt its media content playback to changing network conditions. To prevent tampering, attacks, and/or unauthorized access to media content, segments of the media content may need to be protected via authentication schemes, herein referred to as encryption or encoding schemes.
In adaptive streaming techniques such as Moving Picture Experts Group (MPEG)-DASH standard, segments may be encrypted or encoded as part of an authentication scheme, e.g., to accommodate a pay-per-view video stream model. One example is the segment authentication scheme specified in a draft standard numbered ISO/IEC 23009-4 and entitled “Dynamic Adaptive Streaming over HTTP (DASH)—Part 4: Segment Encryption and Authentication” (ISO/IEC 23009-4), incorporated herein by reference, where IEC stands for International Electrotechnical Commission (IEC). Encoding the entire MPEG stream may produce larger streams than necessary, require comparatively more processing power, and/or introduce lag, and as a result some protocols only encode some segments. Conventional approaches have used alternation schemes to selectively encode segments, e.g., encoding even segments and leaving odd segments unencoded. While encoding less than the full stream, this approach may nevertheless encode more segments than necessary. Conventional approaches have further relied on a single algorithm approach for stream encoding, which may not allow content from different sources to be merged into a single MPEG stream, e.g., entertainment content from a first source and commercials from a second source. Conventional approaches have further included initialization vectors with the stream un-encoding or decrypting information, not accommodating the late binding of initialization vectors.