1. Field of the Invention
The present invention pertains to structure and operation of terminals which receive transmissions in the form of media streams.
2. Related Art and Other Considerations
Public Land Mobile radio Network (PLMN) is a generic term for a mobile wireless network that is centrally operated and administrated by an organization and uses land-based radio frequency transmitters or base stations as network hubs. PLMNs can stand alone and interconnect with one another or connect to a fixed system such as the PSTN.
In the near future there will be an increasing traffic load on the packet switched part of the PLMNs, such as GSM/GPRS, UMTS (WCDMA) and CDMA2000. One service that utilizes packet switched bearers is referred to as Push to talk over Cellular (PoC). Push to talk over Cellular (PoC) is currently being standardized and agreed upon in an industry consortium known as the Open Mobile Alliance (OMA) forum. See, OMA PoC User Plane, OMA-UP-POC=V0—1-20041005-D, Draft Version 1.0.9 Oct. 2004, incorporated herein by reference.
Push-to-talk over Cellular (PoC) is being developed for handsets in networks such as GSM/GPRS networks, EDGE networks, UMTS, and CDMA systems. PoC is basically a voice chat for cellular telecommunication systems. PoC provides quick one-to-one or group communication, providing something like a short instant messaging service which feels like “walkie talkies”.
PoC enabled handsets will most likely be equipped with a PoC-button. The PoC button may (for example) be: a dedicated hardware button; an assigned button on a standard keypad; or, a software button used in e.g. pressure sensitive screens. When the PoC button is pressed, the handset is connected directly to another user or user group. The first releases of PoC provide half-duplex service, although full duplex may be available at a later stage.
Combinational services enrich the Circuit-Switched (CS) voice service of today, with images and video-clips. The images and/or video-clips would utilize the packet switched (PS) part of the PLMNs when being transferred from one user's client to another user's client.
Much effort and investment has been made to develop a fully packet switched solution for voice communication. Such solution is often referred to as Voice over IP (VoIP) since it is assumed that the Internet Protocol (IP) will be used to carry the media. Now this work will be reused to further enhance VoIP. It is anticipated that in the near future it will be possible to offer combinations of, for example, PoC with video and/or images, and VoIP with video and/or images, even over current deployed PLMNs.
Services that combine voice and image/video (regardless if the voice is packet switched or circuit switched) sometimes go under the name Push to Show services.
Devices that receive media streams (including media streams which are provided or are part of Push to talk over Cellular (PoC) and/or Push to Show services) generally have a buffer, commonly known as a jitter buffer, for temporary storage and (when necessary) reordering of packets. The jitter buffer typically serves to smooth out interruptions in the media stream in order to provide downstream equipment in the receiver, e.g., a speech decoder, with an essentially continuous stream of data. Conventionally the jitter buffer has a play out pointer which locates or identifies a position in the jitter buffer from which data of the media stream is to be read out or “rendered”. Jitter buffers are generally known in the context of reception of media streams and elsewhere, as evidenced by the following (all of which are incorporated herein by reference in their entireties): US Patent Application Publication US 2003/0152093; US Patent Application Publication US 2004/0037320; US Patent Application Publication US 2004/0062260; US Patent Application Publication US 2004/0073692; US Patent Application Publication US 2004/0076190; US Patent Application Publication US 2004/0156622; US Patent Application Publication US 2002/0120749; U.S. Pat. No. 6,747,999; U.S. Pat. No. 6,684,273; U.S. Pat. No. 6,658,027; U.S. Pat. No. 6,418,125; U.S. Pat. No. 5,350,271.
Adaptive jitter buffers presently have only one single play out point that is estimated and changed during a session. This means that such jitter buffers have one algorithm that continuously tries to estimate the optimal amount of data that should be in the jitter buffer. One common approach is for the adaptive jitter buffer algorithm to use averages of statistical measures like standard deviation and variance to find out the optimal play out point for the jitter buffer for every point in time. The drawback is that such “averaging” algorithms do not react well to changes of channel settings, media settings or other settings that will abruptly change the characteristics of the transport or the media.
Algorithms for adaptive play out buffers commonly adapt the size of the buffer prior to the session, and try to keep the same buffer size from there on by adaptively changing either the transmission rate or the encoding rate of the media stream. The basic idea is that the receiving side is continuously sending information about its jitter buffer status to the streaming server. The streaming server can then adapt the rate of the media stream according to the received information. The drawback with the streaming approach is that it needs relatively large jitter buffers (in the order of a few seconds) to perform the adaptation due to the “rather slow” mechanism of reporting back the buffer status, which make this approach less useful for real-time services.
Applications utilizing the Real-time Transport Protocol (RTP) use the RTP Control Protocol (RTCP) for synchronizing RTP streams, for example an audio stream with a video stream as in video-telephony service. Real-time Transport Protocol (RTP) is described, e.g., in IETF, “RTP: A Transport Protocol for Real-Time Applications”, RFC 3550, July 2003, incorporated herein by reference.
One problem is how to accurate set the media (e.g. audio, video, image) playback/rendering point to optimize the end-to-end (E2E) content delivery performance. This problem may arise in various situations. For example, the delay of the path of transfer may drastically change due to changes of transport related settings or states in the nodes involved in the transport. As a second example, the media type may change to a type that needs more or fewer bits in the jitter buffer to work properly. As a third example, a media type may be added during the media session, which call for added delay in the jitter buffer due to synchronization.
A channel type switch such as that which occurs in wideband code division multiple access (WCDMA) is one illustration of the first example problem situation for a packet switched audio service, such as VoIP or PoC. WCDMA is described, e.g., in 3GPP, “Technical Specification Group Radio Access Network; Radio Resource Control (RRC), Protocol Specification”, TS 25.331 V4.13.0, March 2004. Consider FIG. 4, which depicts the Radio Resource Control (RRC) state machine of WCDMA. The RRC state starts up in idle mode. When data is to be transmitted, the RRC state may go to CELL_DCH or to CELL_FACH. When the transmitter throughput drops below a certain limit during a certain time period, a channel type down switch to CELL_FACH is executed. After yet some time without any new data the RRC state will switch down further to idle mode. However, if data is received prior to the down switch to idle mode, then depending on the amount of data (e.g., the Radio Link Control (RLC) buffer reaches a certain threshold), the RAB is switched to RRC state CELL_DCH. The problem for the audio is that some media will be transferred during the CELL_FACH state, and when the state switch occurs there will be a delay in the transmission of the media with the result of an annoying gap in the play out of audio to the recipient.
The PoC includes a concept called “user plane adaptation” which provides an illustration of the second example problem situation. The user plane adaptation algorithm collects information about the capacity of each terminal's downlink using the Session Description Protocol (SDP). From that information the PoC server informs all terminals of how much bandwidth the media stream can consume.
The way the bandwidth of the media stream is altered in PoC is by changing the number of speech coder frames in one IP packet. The SDP-parameter used for this purpose is a ‘ptime’ (packet time) parameter. The ptime parameter describes the amount of time the playback of the media in the IP packet will take. By altering the value of ptime from 20 ms to 160 ms, the bit rate of an IP stream conveying AMR5.15 frames can be reduced from 22.0 kbps to 7.6 kbps.
The implication for the jitter buffer when changing the ptime parameter is that the frequency of media reception is changed as well as the amount of media that is changed. Therefore different ptime values call for different jitter buffer depths. A drastic change of ptime may happen if Mobile IP handover is performed so that RObust Header Compression (ROHC) is enabled.
An illustration of the third problem situation occurs when a service is ongoing and sending one type of media and another media type is activated, e.g. a combination of VoIP and real-time video. Under such circumstances of adding a new media type, the play out point in the jitter buffer for the media stream may have to be changed. The reason is that video typically needs longer buffering time than voice. For instance, a low bandwidth scenario may have a video rate of four frames per second and therefore each frame corresponds to 250 ms of media. If the jitter buffer must hold three frames to achieve reasonable quality this means that 750 ms of video is stored in the jitter buffer. Therefore, when adding synchronized real-time video to VoIP the application has to delay the speech in the jitter buffer for as long as the buffering of the video stream by adjusting the play out point.
What is needed, therefore, and an object of the present invention, is an improved technique for reading out media stream data from a jitter buffer.