A media content provider or distributor may deliver various media contents to subscribers or users using different coding schemes suited for different devices, such as televisions, notebook computers, and mobile handsets. The media content provider may support a plurality of media encoder and/or decoders (codecs), media players, video frame rates, spatial resolutions, bit-rates, video formats, or combinations thereof. A media content may be converted from a source or original representation to various other representations to suit the different user devices.
A media content may comprise a media presentation description (MPD) and a plurality of segments. The MPD may be an extensible markup language (XML) file or document describing the media content, such as its various representations, uniform resource locator (URL) addresses, and other characteristics. For example, the media content may comprise several media components (e.g. audio, video, and text), each of which may have different characteristics that are specified in the MPD. Each media component comprises a plurality of segments containing the parts of actual media content, and the segments may be stored collectively in a single file or individually in multiple files. Each segment may contain a pre-defined byte size (e.g., 1,000 bytes) or an interval of playback time (e.g., 2 or 5 seconds) of the media content.
Depending on the application, the media content may be divided into various hierarchies. For example, the media content may comprise multiple periods, where a period is a time interval relatively longer than a segment. For instance, a television program may be divided into several 5-minute-long program periods, which are separated by several 2-minute-long commercial periods. Further, a period may comprise one or multiple adaptation sets (AS). An AS may provide information about one or multiple media components and its/their various encoded representations. For instance, an AS may contain different bit-rates of a video component of the media content, while another AS may contain different bit-rates of an audio component of the same media content. A representation may be an encoded alternative of a media component, varying from other representations by bit-rate, resolution, number of channels, or other characteristics, or combinations thereof. Each representation comprises multiple segments, which are media content chunks in a temporal sequence. Moreover, sometimes to enable downloading a segment in multiple parts, sub-segments may be used each having a specific duration and/or byte size. One skilled in the art will understand the various hierarchies that can be used to deliver a media content.
In adaptive streaming, when delivering a media content to a user device, the user device may select appropriate segments dynamically based on a variety of factors, such as network conditions, device capability, and user choice. Adaptive streaming may include various technologies or standards implemented or being developed, such as Dynamic Adaptive Streaming over Hypertext Transfer Protocol (HTTP) (DASH), HTTP Live Streaming (HLS), or Internet Information Services (HS) Smooth Streaming. For example, the user device may select a segment with the highest quality (e.g., resolution or bit-rate) possible that can be downloaded in time for playback without causing stalling or rebuffering events in the playback. Thus, the user device may seamlessly adapt its media content playback to changing network conditions. To prevent tampering or attacks to a media content, segments of the media content need to protected via authentication schemes. Various attacks (e.g., replication attacks with segments from unexpected representations) may need to be prevented, even when those segments are correct in terms of source and scheduling/timing.