Streaming refers to the ability of an application settled in a client to play synchronized media streams like audio and video streams in a continuous way while those streams are being transmitted to the client over a data network.
Applications that can be built on top of streaming services can be classified into on-demand and live information delivery applications. Examples of the first category are music and news-on-demand applications. Live delivery of radio and television programs are examples of the second category.
Streaming over fixed Internet Protocol (IP) networks is already a major application today. While the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C) have developed a set of protocols used in fixed-IP streaming services, no complete standardized streaming framework has yet been defined. For Third Generation (3G) mobile communications systems, according to the standards developed by the Third Generation Partnership Project (3GPP), the 3G Packet-switched Streaming Service (PSS, 3GPP TS 26.233) fills the gap between the 3G Multi-media Messaging Service (MMS), for instance downloading applications, and conversational services.
The PSS enables mobile streaming applications, wherein the complexity of the terminals is lower than that required for conversational services, because no media input devices and encoders are required, and because less complex protocols can be used. The PSS includes a basic set of streaming control protocols, transport protocols, media codecs and scene description protocols.
FIG. 1 schematically depicts the PSS protocol stack 1 that controls the transfer of both streamable and non-streamable content between a content or media server and a client.
Streamable content 101, such as video, audio and speech, is first converted to the payload format of the Real-time Transport Protocol (RTP) 102 in an adaptation layer 103. Said RTP as defined by the IETF provides means for sending real-time or streaming data by using the services of an underlying User Datagram Protocol (UDP) 104, which in turn uses the services of an underlying Internet Protocol (IP) 105.
Non-streamable content 106, as for instance still images, bitmap and vector graphics, text, timed text and synthetic audio are transferred by the Hypertext Transfer Protocol (HTTP) 107, which uses the services of the underlying Transport Control Protocol (TCP) 108 and the further underlying IP 105.
Whereas for the non-streamable content 106, the built-in session set-up and control capabilities of the HTTP 107 are sufficient to transfer the content, in case of streamable content 101, an advanced session set-up and control protocol has to be invoked, for instance to start, stop and pause a streaming video that is transferred from the content server to the client via the RTP/UDP/IP. This task is performed by the Real-time Streaming Protocol (RTSP) 109, which may either use the underlying TCP 108 or the underlying UDP 104. RTSP requires a presentation description 110 at least to set-up a streaming session. Such a presentation description 110 may for instance be available in the form of a Session Description Protocol (SDP) file. Said SDP file contains the description of the session, for instance session name and author, the type of media to be presented, information to receive said media, as for instance addresses, ports, formats and so on, and the bitrate of the media.
If streaming content is to be viewed at the client side, for instance at a mobile terminal, the user of said terminal is first provided with a Universal Resource Identifier (URI) to specific content that suits his terminal. This URI may come form a WWW server, a Wireless Application Protocol (WAP) server, or may have been entered manually via the keyboard of the terminal. This URI specifies a streaming or RTSP server and the address of the content on that or another content server. The corresponding SDP file may now be obtained in a number of ways. It may be provided in a link inside the HTML page that the user downloads, for instance via an embed tag, or may also be directly obtained by typing it as a URI. The SDP file, i.e. the presentation description 110, then is transferred via the HTTP 107 as indicated in the middle column of the protocol stack of FIG. 1.
Alternatively, it may also be obtained through RTSP 109 signaling, for instance by using the DESCRIBE method of the RTSP 109, as indicated by the right column of the protocol stack in FIG. 1. Note that the presentation description may equally well be transmitted by said RTP 102. However, for simplicity of presentation, this possibility was not included in FIG. 1.
The subsequent session establishment is the process in which the browser or the user of the mobile terminal invokes a streaming client to set up the session against the content server. The terminal is expected to have an active radio bearer that enables IP-based packet transmission at the start of session establishment signaling.
The subsequent set-up of the streaming service is done by sending an RTSP SETUP message for each media stream chosen by the client. This returns the UDP 104 and/or TCP 108 port to be used for the respective media stream. The client sends an RTSP PLAY message to the content server that then starts to send one or more streams over the IP network.
In order to offer service providers in PSS systems means to evaluate the end user streaming experience, streaming service quality metrics have been introduced in PSS systems, as presented in 3GPP Technical document (Tdoc) S4-030860: “Draft Rel-6 PSS Quality Metrics Permanent Document v.0.10”, which refers to 3GPP TSG-SA4 meeting #29 in Tampere, Finland, Nov. 24-28, 2003. The streaming client measures and feedbacks information on the quality of the actual streaming application to a streaming server, wherein said quality is defined in terms of said quality metrics. Said streaming server may for instance be an RTSP server, and said quality metrics may for instance be transported by using said RTSP and SDP.
Because the service is transparent to the type of RAN and CN, only the streaming client and the streaming server are impacted by the PSS quality metrics. One consequence of this is that the measurements may not rely on information from protocol layers below the RTP layer (e.g. UDP, IP, PDCP, RLC).
The terminal in a PSS system with quality feedback is responsible to perform the quality measurements in accordance to the measurement definition, aggregate them into streaming client quality metrics and report the metrics to the streaming server. This requirement does not preclude the possibility for the streaming client to report raw quality measurements to be processed by the streaming server into quality metrics.
The streaming server is responsible to signal the activation of the streaming client's quality metrics reporting and to gather the streaming client's quality metrics. The streaming server may process the received streaming client's quality metrics to build aggregated quality metrics. E.g. it could receive a raw lost packets report and build the Min, Max, Avg and Std packet loss rate for a particular streaming client.
The objective of the quality metric definition is to obtain consistent measurements across content type, terminals, and types of Radio Access Network (RAN).
The constraints are to minimize the size of the quality metrics report that will be sent to the streaming server and, the complexity for the terminal.
The quality metrics can be divided in 3 different types:
A first set of metrics are computed from terminal-based media quality measurements (measured within the decoder or predicted at the decoder input), e.g. the corruption duration, which is defined as the time from the start of the first corrupted media (audio/speech/video) decoded frame to the start of the first subsequent decoded good frame or the end of the reporting period (whichever is sooner), not including the buffering freezes/gaps and pause freezes/gaps.
A second set of metrics are computed by the terminal based on the general PSS protocol and the operation of the player that renders the streaming application. E.g. abnormal termination of a session.
A third set of quality metrics are computed based on terminal-measured network characteristics. E. g. number of packets lost in succession.
As already mentioned, in PSS systems RTSP is used for the feedback of quality reports according to the quality metrics. FIG. 2a lists the definition of an RTSP protocol data unit header 2a QoE-Metrics for the negotiation of the quality metrics between the streaming client and the streaming server, and FIG. 2b lists the definition of an RTSP protocol data unit header 2b QoE-Feedback for the actual feedback of quality metrics from the streaming client to the server, wherein QoE stands for “Quality of Experience”.
The negotiation header 2a of FIG. 2a can be used in two ways:    1. If only the Off parameter is used, this is an indication that either the streaming server or the streaming client wants to cancel the quality metrics monitoring and reporting.    2. If the header 2a contains other parameters, then the quality metrics transmission is requested to start (or restart in case of mid-session monitoring).
If the negotiation header 2a is used with the RTSP Session Control url information, then QoE-Metrics is used at the session level. If the url is an RTSP Media Control url, then QoE-Metrics is used at the media level and each media gets its own QoE-Metrics line.
It is required to set the sending rate. If the Sending-rate value is 0, then the streaming client can send feedback messages at any time depending on events occurring in the streaming client. Values≧1 indicate a precise message-sending interval. The shortest interval is once a second and the longest interval is undefined. The feedback sending interval can be different for different media, but it is recommended to keep a sort of synchronization, to avoid extra traffic in the uplink direction. The value End indicates that only one message is sent at the end of the session. The Range field can be used to define the time limit of feedback sending. In this way it is possible to decide the monitoring time range during the negotiation phase.
The actual quality metrics feedback can be conveyed to the PSS server by using the SET_PARAMETER method of the RTSP with the feedback header 2b of FIG. 2b. 
In the feedback header 2b of FIG. 2b, Stream-url is the RTSP session or media control URL identifier for the feedback parameter. The Metrics field in the Parameters definition contains the name of the metrics/measurements (for instance corruption duration, etc.) and it shall be the same as the Metrics field in the negotiation QoE header 2a (QoE-Metrics). It is recommended to keep the order of metrics the same to simplify parsing. The Value field indicates the results. There is the possibility that the same event occurs more than once during a monitoring period. In that case the metrics value can occur more than once, which indicates the number of events to the server. The optional Timestamp indicates the time when the event (or measurement) occurred or when the metric was calculated since the beginning of the session. Also no events can be reported (using the SP—space). The optional Range indicates the reporting period.
Quality metrics reporting is normally done by the PSS client using the SET_PARAMETER method of the RTSP. However, in particular cases, it is more efficient to use other methods to carry the information, as for instance the TEARDOWN message or the PAUSE message.
Turning back to the above-stated quality metrics definition of the corruption duration as a representative of a first set of quality metrics that are computed from terminal-based media quality measurements, it is readily seen that, apart from the dependency of this quality metrics definition on the further definition of a “corruption” and a “reporting period”, this quality metrics definition particularly depends on a definition of a “good frame”.
A good frame is a media (audio/speech/video) decoded frame that is not corrupted, i.e. that doesn't contain any freezes/gaps or quality degradations. To declare a video or audio frame as good, in Tdoc S4-030860, the following definition is introduced: “A good frame is the earlier of N frames after last loss or a complete I-frame, where N is either (a) signaled or (b) defaults to ∞ (for video) or 1 (for audio)”.
The application of this definition is not mandatory, resulting in a wide range of interpretations of the definition of a good frame. Thus different streaming clients may report different streaming qualities, because for the same quality metric (for instance corruption duration), different definitions for a “good frame” are applied. A similar ambiguity arises when different terminals use different error tracking algorithms, so that even when using the same definition of a “good frame”, the reported streaming quality in terms of the same quality metric may differ across the terminals. These ambiguities cause the reported quality metrics to be imprecise and effectively worthless.