An HTTP Streaming client uses HTTP GET requests to receive a media presentation. The presentation is described in a presentation description, e.g., an XML document called a Media Presentation Description (MPD), which is described in 3GPP TS 26.247. From the MPD, the client can learn in which formats the media content is encoded (e.g., bitrates, codecs, resolutions, languages, and so forth). The client then chooses a format, which may be based on one or more of screen resolution, channel bandwidth, channel reception conditions, language preferences of a user, and so forth.
With HTTP Streaming, the media is received a portion at a time. This is necessary for live content so that media playout of the content does not fall too far behind live encoding. It also enables the client to switch to a different encoding adaptively according to channel conditions, etc. Segments in 3GPP HTTP Adaptive Streaming are downloadable portions of the media whose locations (URL and possibly a byte range) are described in the MPD. In other words, the client is informed how to access the segments via the MPD.
In 3GPP, the HTTP Streaming client assumes the use of the 3GPP file format and movie fragments. The 3GPP file format is based on the ISO/IEC 14496-12 ISO Base Media File Format. Media files, in accordance with the 3GPP file format, comprise of a series of objects called boxes. Boxes can contain media data or metadata. Each box has an associated boxtype (typically a 4 character name for 32 bits total) and an associated size (typically a 32 bit unsigned integer). In non-fragmented files, a moov metadata box contains all of the codec information, timing information, and location information needed to play the media data. For fragmented media files of HTTP Streaming, the moov box only contains codec information and all of the timing information and location information is contained within the movie fragments themselves. Movie fragments typically comprise one or more pairs of a moof box and an mdat box. The moof box contains metadata for the movie fragment and the mdat box contains media data for the movie fragment. The use of fragmented files enables the encoder to write and the client to receive the media a portion at a time. This minimizes startup delay by including metadata in the moof boxes of the media fragments as opposed to up front in the moov box. The moov box still contains a description of the codecs used for encoding, but does not contain any specific information about the media samples such as timing, offsets, etc. moof boxes are only allowed to contain references to the codecs listed in the moov box. If a new codec needs to be used that has not been defined in the moov box, then a new moov box needs to be created in a new file because it is not valid to have two moov boxes within the ISO based media file format.
A presentation description (such as an MPD) comprises at least one period, a period comprises at least one representation, and a representation comprises at least one segment. A segment contains one or more movie fragments. All of the segments of one particular encoded format are referred to as a representation. Each representation has one corresponding initialization segment, which may be common amongst different representations (containing a moov box in the case of 3GPP). Each period implies a new moov box and a decoder initialization.
Currently in the specifications for HTTP Streaming, there is no requirement that the client fetch an updated MPD at a regular interval during live streaming. The client is informed of the addresses of media segments via the MPD. When a playlist structure of the MPD is used with live streaming, the MPD may be updated with the locations of the newly encoded segments one or a few at a time. The client fetches an updated MPD by issuing an HTTP GET or partial GET (a GET that uses the range request header to specify a byte range). So, if a client has fetched the MPD 30 minutes into a live presentation and the user wants to watch from the beginning of the presentation, the client has all of the segment locations needed for the next 30 minutes (assuming that the time shift buffer is at least 30 minutes in duration). Hence, the client does not need to download a new MPD for approximately 30 minutes (at which point it would run out of data). Also, if segment locations are advertised in the MPD before the segments are created, then the client can know the locations of segments into the future and does not need to fetch an MPD every time that a new segment gets encoded. Thus, the client may fetch an updated MPD very infrequently. For both of these cases 1) the client tunes in late to a live presentation and wants to watch/listen to the presentation from the beginning and 2) the segment locations are advertised in advance of their existence so that the client may know segment locations well into the future, the client may not fetch an updated MPD for a relatively long interval according to the current specification.
The information provided in the MPD guides the 3GPP Adaptive HTTP streaming client. A client successfully fetches an MPD when in response to a request for the latest MPD, the client either receives an updated MPD or receives an indication that allows the client to verify that the MPD has not been updated since the previous fetch.
In accordance with existing standards, a client is forced to fetch an updated MPD only when it receives multiple 404 http error codes or if it runs out of segment location information. Thus, a server cannot gracefully cause a client to update its MPD and cannot gracefully migrate its media files or segments to different locations.
Additionally, clients assume that media segment locations can be advertised even before the segments are actually available. So the client may know a media segment's future location before the media segment actually exists. Therefore the client must calculate, for each and every segment, at what time the segment becomes available. The client cannot assume that because the media segment location is available in the MPD that the media segment itself is available.
To gracefully require a client to update its MPD, a client may be notified that an MPD update has occurred by one or more indications in a box within a segment of a media file. In other words, a client may be notified inband that an MPD update has occurred. An example media update box is shown below:
 aligned(8) class MPDUpdateBox extends FullBox‘mupe’) { unsigned int(3) mpd_information_flags; unsigned int(1) new_location_flag; unsigned int(28) latest_mpd_update_time; /// The following are optional fields string mpd_location}
For this example, the following semantics are used:
mpd_information_flags: contains the logical OR of zero or more of the following reason codes: 0x00 Media Presentation Description update now, 0x01 Media Presentation Description update ahead, 0x02 End-of-presentation, and 0x03-0x07 reserved;
new_location_flag: if set to 1, then the new Media Presentation Description is available at a new location specified in mpd_location;
latest_mpd_update_time: indicates the time in milliseconds by when an MPD update is necessary relative to when the latest MPD was issued, i.e., an MPD issue time, so that a client may update the MPD any time between now and the latest_mpd_update_time; and
mpd_location: present if and only if the new_location_flag is set and provides a Uniform Resource Locator (URL) for the new Media Presentation Description.