A hyper text transfer protocol (HTTP) streaming client uses HTTP GET requests to download one or more presentations of media. The presentation as described in an extensible markup language (XML) document (e.g., 3GPP SA4 specification) may also be referred to as a Media Presentation Description (MPD). From the MPD, the client can learn in what formats the media content is encoded (e.g. bitrates, codecs, resolutions, languages). The client then chooses a format based on one or more of screen resolution, channel bandwidth, channel reception conditions, language preference of the user, etc.
With HTTP streaming, the media is downloaded a portion at a time. This is necessary for live content so that playout of the content does not fall too far behind live encoding. It also enables the client to switch to a different content encoding adaptively according to channel conditions, etc. Segments, in accordance with 3GPP HTTP Adaptive streaming, are downloadable portions of the media whose locations (URL and possibly a byte range) are described in the MPD. In other words, the client is informed how to access the segments via the MPD.
In accordance with 3GPP, the HTTP streaming client assumes the use of the 3GPP file format and movie fragments, wherein a segment contains one or more movie fragments. The 3GPP file format is based on the ISO/IEC 14496-12 ISO Base Media File Format. Files consist of a series of objects called boxes. Boxes can contain media or metadata. Each box has an associated boxtype (typically a 4 character name (32 bytes total)) and an associated size (typically a 32 bit unsigned integer). Movie fragments may consist of “moof”/“mdat” box pairs. The “moof” box contains metadata for a movie fragment and the “mdat” box contains media data for the movie fragment. The use of fragmented files enables the client to download the media a portion at a time, while minimizing startup delay by including metadata in the “moof” boxes of the media fragments as opposed to up front in the “moov” box. The “moov” box still contains a description of the codecs used for encoding, but does not contain any specific information about the media samples such as timing, offsets, etc.