Streaming, on the one hand, refers to the ability of an application settled in a client to play back synchronized media streams like speech, audio and video streams in a continuous way while those streams are being transmitted to the client over a data network. On the other hand, streaming also refers to real-time low-delay applications such as conversational applications.
Applications that can be built on top of streaming services can be classified into on-demand and live information delivery applications. Examples of the first category are music and news-on-demand applications. Live delivery of radio and television programs are examples of the second category. Real-time low delay application are, for example, multimedia (video)telephony or Voice over IP and any type of conversational multimedia application.
Streaming over fixed Internet Protocol (IP) networks is already a major application today. While the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C) have developed a set of protocols used in fixed-IP streaming services, no complete standardized streaming framework has yet been defined. For Third Generation (3G) mobile communications systems according to the standards developed by the Third Generation Partnership Project (3GPP), the 3G Packet-switched Streaming Service (PSS, 3GPP TS 26.233, TS 26.234) fills the gap between the 3G Multi-media Messaging Service (MMS), for instance downloading applications and multimedia content, and conversational & streaming services.
The PSS enables mobile streaming applications, wherein the complexity of the terminals is lower than that required for conversational services, because no media input devices and encoders are required, and because less complex protocols can be used. The PSS includes a basic set of streaming control protocols, transport protocols, media codecs and scene description protocols.
FIG. 1 schematically depicts the PSS protocol stack 1 that controls the transfer of both streamable and non-streamable content between a content or media server and a client.
Streamable content 101, such as video, audio and speech, is first converted to the payload format of the Real-time Transport Protocol (RTP) 102 in an adaptation layer 103. Said RTP as defined by the IETF provides means for sending real-time or streaming data by using the services of an underlying User Datagram Protocol (UDP) 104, which in turn uses the services of an underlying IP protocol 105.
Non-streamable content 106, as for instance multimedia content which is not created for streaming purposes (e.g. MMS clips recorded on a terminal device), still images, bitmap and vector graphics, text, timed text and synthetic audio are transferred by the Hypertext Transfer Protocol (HTTP) 107, which uses the services of the underlying Transport Control Protocol (TCP) 108 and the further underlying IP 105.
Whereas for the non-streamable content 106, the built-in session set-up and control capabilities of the HTTP 107 are sufficient to transfer the content, in case of streamable content 101, an advanced session set-up and control protocol has to be invoked, for instance to start, stop and pause a streaming video that is transferred from the content server to the client via the RTP/UDP/IP. This task is performed by the Real-time Streaming Protocol (RTSP) 109, which may either use the underlying TCP 108 or the underlying UDP 104. RTSP requires a presentation description 110 at least to set-up a streaming session. Such a presentation description 110 may for instance be available in the form of a Session Description Protocol (SDP) file. Said SDP file contains the description of the session, for instance session name and author, the type of media to be presented, information to receive said media, as for instance addresses, ports, formats and so on, and the bitrate of the media.
If streaming content is to be viewed at the client side, for instance at a mobile terminal, the user of said terminal is first provided with a Universal Resource Identifier (URI) to specific content that suits his terminal. This URI may come form a WWW server, a Wireless Application Protocol (WAP) server, or may have been entered manually via the keyboard of the terminal. This URI specifies a streaming or RTSP server and the address of the content on that or another content server. The corresponding SDP file may now be obtained in a number of ways. It may be provided in a link inside the HTML page that the user downloads, for instance via an embed tag, or may also be directly obtained by typing it as a URI. The SDP file, i.e. the presentation description 110, then is transferred via the HTTP 107 as indicated in the middle column of the protocol stack of FIG. 1. Alternatively, it may also be obtained through RTSP 109 signaling, for instance by using the DESCRIBE method of the RTSP 109, as indicated by the right column of the protocol stack in FIG. 1. Note that the presentation description may equally well be transmitted by said RTP 102. However, for simplicity of presentation, this possibility was not included in FIG. 1.
The subsequent session establishment is the process in which the browser or the user of the mobile terminal invokes a streaming client to set up the session against the content server. The terminal is expected to have an active radio bearer that enables IP-based packet transmission at the start of session establishment signaling.
The subsequent set-up of the streaming service is done by sending an RTSP SETUP message for each media stream chosen by the client. This returns the UDP 104 and/or TCP 108 port to be used for the respective media stream. The client sends an RTSP PLAY message to the content server that then starts to send one or more streams over the IP network.
In order to offer service providers in PSS systems means to evaluate the end user streaming experience, streaming service quality metrics have been introduced in PSS systems, as presented in 3GPP Technical document (Tdoc) S4-040073: “Draft Re1-6 PSS Quality Metrics Permanent Document v.0.11”, which refers to 3GPP TSG-SA4 meeting #30 in Malaga, Spain, Feb. 23-27, 2004. The streaming client measures and feeds back information on the quality of the actual streaming application (Quality of Experience, QoE) to a streaming server, wherein said quality is defined in terms of said quality metrics. Said streaming server may for instance be an RTSP server, and said quality metrics may for instance be transported by using said RTSP and SDP.
Because the service is transparent to the type of RAN and CN, only the streaming client and the streaming server are impacted by the PSS quality metrics. One consequence of this is that the measurements may not rely on information from protocol layers below the RTP layer (e.g. UDP, IP, PDCP, SNDCP, LLC, RLC, MAC, Physical Layer).
The terminal in a PSS system with quality feedback is responsible to perform the quality measurements in accordance to the measurement definition, aggregate them into streaming client quality metrics and report the metrics to the streaming server. This requirement does not preclude the possibility for the streaming client to report raw quality measurements to be processed by the streaming server into quality metrics.
The streaming server is responsible to signal the activation of the streaming client's quality metrics reporting and to gather the streaming client's quality metrics. The streaming server may process the received streaming client's quality metrics to build aggregated quality metrics. E.g. it could receive a raw lost packets report and build the Min, Max, Avg and Std packet loss rate for a particular streaming client.
The following four quality metrics are defined by Tdoc S4-040073:
Corruption Duration
Corruption duration is the time period from the first corrupted frame to the first subsequent good frame or the end of the reporting period (whichever is sooner). The unit of this metrics is expressed in seconds, and can be a fractional value.
Rebuffering Duration
This metric is only applicable for audio, video and speech, and is not applicable to other media types. The unit of this metrics is expressed in seconds, and can be a fractional value. Rebuffering is defined as any stall in playback time due to any involuntary event at the client side.
Initial Buffering Time
Initial buffering is the time from receiving the first RTP packet until playback starts. The unit of this metrics is expressed in seconds, and can be a fractional value.
Number of Content Packets Lost in Succession
The number of content packets lost in succession per media channel.
The objective of the above quality metric definition is to obtain consistent measurements across content type, terminals, and types of Radio Access Network (RAN).
The constraints are to minimize the size of the quality metrics report that will be sent to the streaming server and, the complexity for the terminal.
The actual quality metrics feedback can be conveyed to the PSS server by using the SET_PARAMETER method of the RTSP with a feedback header 2 as depicted in FIG. 2 (with reference to IETF Request for Comments (RFC) document 2327), however, in particular cases, it is more efficient to use other methods to carry the information, as for instance the TEARDOWN message or the PAUSE message.
In the feedback header 2 of FIG. 2, Stream-url is the RTSP session or media control URL identifier for the feedback parameter. The Metrics field in the Parameters definition contains the name of the metrics/measurements (for instance corruption duration, etc.). The Value field indicates the results. There is the possibility that the same event occurs more than once during a monitoring period. In that case the metrics value can occur more than once, which indicates the number of events to the server. The optional Range field indicates the reporting period.
The optional Timestamp field in the feedback header 2 of FIG. 2 indicates the time when the event (or measurement) occurred or when the metric was calculated since the beginning of the session.
The four quality metrics defined by Tdoc S4-040073 only allow for a coarse characterization of the quality of the playback of multimedia streams as experienced by a user. For instance, if two streaming sessions have the same values of the four quality metrics defined by Tdoc S4-040073, and if in the first of said sessions, a perfect synchronization between audio and video data exists, whereas in the second of said sessions, said synchronization between audio and video has been lost, the reported quality based on the four quality metrics defined by Tdoc S4-040073 is the same while the actually experienced quality of playback is quite different. Furthermore, the four quality metrics defined by Tdoc S4-040073 do not differentiate between the different frame types contained in said multimedia stream, so that, for instance, the loss of frame types that are of crucial importance for the experienced quality of the playback can not be differentiated from the loss of less important types of frames when reporting quality.