An important application of transmission networks like the Internet or mobile telephone networks is the media delivery from a server to a client. Media may be for example audio and video.
Media delivery in IP (Internet Protocol) based networks may use different transport protocols. Traditionally, either RTP (Real-time Transport Protocol) over UDP (User Datagram Protocol) is used for real-time streaming and packet-based streaming or HTTP (Hyper Text Transfer Protocol) over TCP (Transmission Control Protocol) for download of whole files, mostly for later consumption but also for life streaming. RTP allows for dynamic adaptation to available bit-rate as measured by the client. A drawback of RTP and the associated control protocol RTSP (Real-time Streaming Protocol) is the need for specialized and more complicated server software, while HTTP can use widely deployed and inexpensive HTTP server software. A recent development, Adaptive HTTP Streaming (AHS), aims at combining the advantages of both approaches. AHS is standardized in 3GPP (Third Generation Partnership Project), and also adopted and slightly extended in the Open IPTV Forum (OIPF). MPEG (Moving Pictures Experts Group) is also working on AHS.
In AHS, the content is encoded in different versions, usually corresponding to different bit rates. If the content is for example a video with a video track and an audio track, the video track could be encoded in three versions with different bit rate each, and the audio track in a high-quality stereo and a mono version. Each version is further divided into segments of a few seconds duration. For example, the video versions can be divided into many consecutive segments of 10 seconds duration each. The segments may be formatted according to the MPEG-4 file format, or according to the MPEG-2 transport stream format.
The actual transmission of the video and audio tracks is performed by downloading one segment after the other initiated by the client. In this procedure the client downloads a segment using a standard HTTP request, unpacks, decodes, and renders it, and then does the same for the next segment etc. The client has knowledge about the available quality versions, and about the segment separation over time by means of a media description, the so-called Media Presentation Description (MPD). The MPD format as defined in 3GPP and OIPF is an XML (eXtensible Markup Language) encoded file containing appropriate information and attributes to describe the media. The MPD is the first resource transmitted to a client in order to start an AHS based media delivery. The MPD as it is specified by 3GPP comprises the different available qualities and information how they are arranged into segments.
Each segment is downloaded at the maximum available speed under the present operation conditions of the network used for transmission and the client monitors the download speed it experiences. Based on the experienced download speed the client selects the most appropriate of the available quality versions. From segment to segment this may be a different version, and the client can download different qualities depending on the present operation conditions, hence the attribute “adaptive” HTTP streaming. FIG. 1 visualizes the principle and shows different media representations for adaptive HTTP streaming of a content item as a function of the playout time. The three representations in FIG. 1 may correspond to a high, medium and low bitrate representation, respectively, of a content item, i.e. stream. Begin and end of the playout time for the stream segments of different representations coincide so that smooth switching between the representations is possible. The vertical scale in FIG. 1 illustrates the data size of the different stream representations, e.g. their bit rate. Depending on the client implementation, enhanced selection procedures are possible for switching between the representations, e.g. including a hysteresis in order to avoid excessive quality fluctuations when viewing or listening to a stream.
Another trend in multimedia communication is the usage of the IP Multimedia Subsystem (IMS) for the initiation and control of multimedia sessions. Within 3GPP, standardized solutions for IMS controlled RTP streaming as well as for IMS controlled HTTP progressive download are defined in 3GPP TS 26.237 V9.3.0 (2010-06) with the title IP Multimedia Subsystem (IMS) based Packet Switch Streaming (PSS) and Multimedia Broadcast/Multicast Service (MBMS) User Service; Protocols. These solutions benefit from the standardized features offered by IMS like charging, authentication or QoS (Quality of Service) reservation.
FIG. 2 shows the different signaling steps in case of IMS controlled HTTP progressive download as defined in 3GPP TS 26.237. The session is initiated with a SIP (Session Initiation Protocol) INVITE message which includes SDP (Session Description Protocol) information. The HTTP URL (Uniform Resource Locator) for download is delivered to the user equipment (UE), i.e. client, via the SIP 200 OK message. In addition, a QoS reservation for the HTTP progressive download session may be carried out. The progressive download itself is initiated by the UE with a HTTP GET command towards the HTTP server, which in return responds with the requested content file. In more detail, the following steps are performed:                1. The UE initiates the progressive download session by sending SIP INVITE to the IM CN subsystem, including an SDP offer.        2. The IM CN subsystem forwards the SIP INVITE message to the SCF.        3. The SCF verifies the user rights for the requested content, selects an HTTP/SIP adapter, and forwards the SIP INVITE message to the HTTP/SIP adapter.        4. The HTTP/SIP adapter selects an HTTP Server, and sends an HTTP POST message to the HTTP server, including the IP address of the UE.        5. The HTTP server answers to the HTTP/SIP adapter with a HTTP 200 OK response.        6. The HTTP/SIP adapter sends the SIP 200 OK answer to the SCF, including download URL of the requested content file in the SDP answer.        7. The SCF forwards the SIP 200 OK to the IM CN subsystem.        8. The IM CN subsystem forwards the SIP 200 OK to the UE.        9. The UE sends an HTTP request to the URL obtained from the SIP 200 OK message.        10. The HTTP server delivers the content file in the HTTP response to the UE.        
The current AHS concept as specified e.g. in 3GPP TS 26.234 Transparent end-to-end Packet-switched Streaming Service (PSS), Open IPTV Forum—Release 2 Specification, HTTP Adaptive Streaming, DRAFT V0.06—Jun. 7, 2010 or in proprietary solutions like Microsoft Smooth streaming or Apple streaming (see R. Pantos, HTTP Live Streaming, http://tools.ietf.org/html/draft-pantos-http-live-streaming-01) specify only the media packaging, media description and download mechanisms. No connection is foreseen to combine the mechanism with resource or QoS reservation mechanisms. Thus, even in managed systems where QoS reservation and control is possible, AHS works with best effort and will in general therefore still require adaptation.