Transmission of a video media stream from a first terminal to at least one other terminal will be used in many services in future mobile communication systems. For some services the video media stream will be combined with other media streams, such as voice, i.e. the video stream will be transmitted from a sending terminal to a receiving terminal at least partly simultaneously with the other media stream, e.g. during a voice call between the users of the two terminals. Also, there may be other services when different media streams are transmitted at least partly simultaneously from a sending terminal to a receiving terminal.
An example of such a service is Push-to-Show (PtS) Video. PtS is a so-called combinational service that utilizes an IP Multimedia Subsystem (IMS) as the service layer platform. In the technical specification 3GPP TS 23.279 V1.0.0 published by 3GPP in February 2005, it is further described how PtS is enabled. This document provides the architectural details for using a circuit switched voice call in association with an IMS packet data session. The document provides a detailed description of how circuit switched services and IMS services can be combined into a combinational service. Basically, the PtS service is an enriched phone call. In PtS Video, the enrichment is live-streamed video that is transferred between terminals during the phone call.
If a service is described as a combinational service it also means that the service uses an ordinary circuit switched (CS) channel for voice while the enrichment, in this case the live-streamed video, uses a packet switched (PS) channel. The live-streamed video enrichment is started by just a push of a button on one of the terminals involved in a phone call. This means that PtS Video differs from ordinary video telephony in that PtS offers the possibility of changing service, i.e. going from voice only to video and voice, during an ongoing call.
It is believed that in most PtS sessions, the live-streamed video is sent “one-way” (or simplex) from one user to another in order to enable a “See what I see” type of service. Below is a typical user scenario for PtS Video:
A PtS user is in a store and wants to buy a shirt. Before buying the shirt the PtS user wants a friend's opinion and calls him. When the PtS user gets connected to his friend, the PtS user enables the live-streamed video enrichment by a push of a button. After that the PtS user records the shirt using the in-built video camera in the phone. The live-streamed video is sent to the friend that views the shirt. After having formed an opinion if the PtS user should buy the shirt, the friend gives his opinion using the CS-voice channel.
An issue in the PtS Video scenario is that the voice stream and the video stream is not sent over the same path in the communication system and thus there is a problem of synchronizing the presentation of the voice and the video data streams at the receiving terminal. Since the voice stream and the video stream uses different radio bearers and do not transverse an identical set of nodes in access networks and core networks of the mobile communication system, the flows will have different end-to-end delay characteristics. An end-to-end delay is defined as the time from transmitting a part of a media stream, such as a data packet in a PS media stream, from the sending terminal until that part of the media stream is presented at the receiving terminal. The end-to-end delay would comprise transmission time and buffering time, wherein buffering time is the time a received part of the media stream is stored in a buffer in the receiving terminal before it is presented. The buffering time for CS voice is normally very short, whereas it may be substantially long for PS data, such as video, as will be shown below. Also, the transmission time for CS voice is normally shorter than the transmission time for PS data. The end-to-end time delay may also comprise a buffering time at the sending terminal before the part of the media stream is actually transmitted.
To synchronize the presentation of the two flows at the receiving terminal, by prolonging the transmission time and/or the buffering time of one flow and/or shortening the transmission time and/or the buffering time of the other flow, is difficult. Also, to induce extra delay on a CS voice stream might not be desired, since the experienced CS voice quality is highly dependent on the end-to-end time delay. Also, the PS domain in 2G networks, such as GSM/GPRS and EDGE networks, currently lacks a proper handover mechanism. This means that rather long interruptions of the PS data transfer may happen in normal operation during handovers. The variation of the radio conditions and retransmissions of radio blocks are also factors that contribute to interruptions of the PS data transfer, which results in delay variations or jitter in the PS data transfer. To handle such interruptions the receiving client uses a jitter buffer. This means that an additional buffer delay has to be included in the end-to-end delay of packet switched data, such as live-streamed video, for achieving a good data quality at the receiving terminal.
In order to have a smooth playback of the received PS-video when deploying Push-to-Show over mobile networks, especially 2G networks, it is believed that a quite large buffering is needed in the receiving client. This buffering is needed to overcome the sudden radio outages and delay variations in the PS data transfer explained above. All in all, the end-to-end delay, i.e. the time delay from a packet of the first video stream is transmitted from the sending terminal until the packet is displayed on the screen of the receiving terminal needs to be rather long. At the same time, the end-to-end voice delay is short as it uses a CS-channel that favors constant low delay over successful transmission of radio blocks, i.e. the retransmissions of radio blocks is turned off in CS channels.
Typically, the end-to-end time delay for video is about 2 seconds in a 2G network and the end-to-end time delay for circuit switched voice is about 0.2 seconds. This mismatch may make it difficult for the user of the receiving terminal to follow the conversation if the user of the sending terminal talks about what he is recording, which is something the user of the receiving terminal will see in a few seconds time. This is illustrated by the following example:
PtS User A talks to PtS User B over the phone. PtS User A pushes the PtS button and records a bowling competition between a few of his friends. Every time a friend is throwing the bowling ball, PtS User A comments the style of the friend throwing the ball as well as the reaction of the ball on the lane. However, the PS connection between PtS User A and PtS User B has a fairly long media path delay (maybe several seconds). This may be due to slow retransmissions of the PS data blocks over the air interface, long buffering time in the receiving PtS Client in order to prevent freezing of the played out video stream or congestion in the PS core network. The lack of synchronization of the comments over the CS voice channel and the actual played out video of the friends that are playing bowling is perceived as rather annoying by PtS User B.
However, it is not only in the case of CS voice and PS video that a solution to the synchronization problem is needed. In the future, mobile networks will also offer PS-voice and PS-video services. Therefore, a possible service may be a real-time PS voice call that is enriched with the transmission of a video clip. Here this service is referred to as PtS Clip. The transmission of the video clip may be a so-called progressive download. This means that the receiving user can consume the content of the transmitted file during the transmission of it. An example of using the PtS is shown below:
PtS User A talks to PtS User B over the phone. PtS User A pushes a PtS button and sends a stored video clip to PtS User B. The video clip shows the bowling competition mentioned above. Anyway, PtS User A wants to comment the style used every time a person is throwing the bowling ball. Therefore, the video clip is also presented to PtS User A on the display of his terminal. However, the transfer delay between PtS User A and PtS User B until PtS User B can start watching the video clip at his terminal is fairly long, maybe several seconds. This may be due to slow retransmissions of the PS data blocks over the air interface, long buffering time in the receiving PtS Client in order to prevent freezing of the played out video stream or congestion in the PS core network. The lack of synchronization of the comments presented on PtS User B's terminal as a PS voice call, and the actual play-out of the video clip on PtS User B's terminal is perceived as rather annoying for PtS User B.
Further, the sender may want to synchronize other types of data streams with the receiver. One such example may be a whiteboard session together with a voice call commenting the whiteboard session.
As shown above, there is a need for a solution for synchronizing the presentation of a first media stream, such as video, at a receiving terminal to the presentation of a second media stream, such as voice, at the receiving terminal, when the first media stream has a first end-to-end delay and the second media stream has a second end-to-end delay substantially shorter than the first end-to-end delay. The synchronisation should be made such that the perception for a user of the receiving terminal would be that the information in the first media stream and the information in the second media stream is synchronized.