Streaming digital media content like video or audio in compressed form over IP networks, e.g. the Internet, is for the viewer perceived as an instant downloading during playback of the content. In a live distribution, the video signal is converted into a compressed digital signal and transmitted from a master server as unicast or multicast, simultaneously sending a single file to multiple user client devices.
One of the most common ways to distribute video over the Internet is to use HLS (HTTP Live Streaming) where the video stream is chucked into 10 second video files so the video will consist of a series of these 10 second video files. The client device then requests these files using normal http and to ensure that the client device always has video data to present, it ensures that it has at least 3 of these files buffered in the device. This means that this buffering will impose at least 30 seconds of delay. In addition, a content player downloads and stores data in the receiving device to compensate for network problems such as packet loss and jitter. At start up the buffer is filled up to a certain level, typically 30 seconds. If packets are lost, the transport protocol TCP requests the data again and also reduces the bitrate on outgoing traffic to avoid congestion. Since this process takes time and the more errors the slower, the buffer needs to increase with the result of a larger delay. This means that the delay will increase over time. Another consequence is that since the different devices experience different packet loss and jitter, the buffer fill level will be different on the different client devices meaning that their presentation of the video will be delayed and out of sync.
For online streaming over the Internet, primarily to mobile devices, tablets or Internet connected TV sets, the variance in delay of the transport of the media content to different viewers watching (or listening) on different devices can vary significantly. This has the effect that e.g., a live TV channel or a live sports event can be viewed with seconds and, in the case of current Over-The-Top (OTT) delivery, even several minutes difference. This can ruin the experience for the one with the longest delay, if people are sitting close to each other like in a bar or train watching the same content, or if they are in parallel to the viewing having a social conversation such as phone, Facebook, Twitter, SMS or chat. Further, the absolute delay throughout the distribution is of course in itself a problem in real-time multimedia communication. The actual transport delay over the Internet from source to different receiving client devices can vary between typically a few to several hundred milliseconds depending on type of underlying network infrastructure, last mile technology and network congestion.
Typically, TV/video over the Internet also involves the use of Adaptive Bit Rate (ABR), meaning that the same program is sent in different versions with different qualities/bitrates, and additional buffering is needed to handle the switching between different Bit Rates in a seamless manner.
A commonly used protocol for synchronizing the playout of media content files is a Transport Protocol for Real-Time Applications (RTP), in the following referred to as RTP. RTP includes a control protocol RTCP which provides synchronization information including timestamps and control packets from the master server and feedback information from the client device. When RTP media data packets are sent, the timestamps in the media data packets describe the moment in time the packet was created in time units of the sample clock. The main intention of RTP is to provide re-synchronizing of video and audio content received in a client device as separate streams. RTP/RTCP does not sync between different devices since the timestamps do not have relation to the global clock such as UTC.
A prior art document, WO 2012/021747, discloses a method comprising transmitting a playback session identifier to a content server. The playback session identifier is associated with a unique playback session for a digital content title, receiving a server side event that includes a playback command and a specified time for executing the playback command, and scheduling the playback command for execution at the specified time based on a local time signal that has been synchronized to a time reference signal generated by a remote time server. While the document sets forth a method for playing a digital content title with different individual viewing preferences synchronously across multiple endpoint devices the disclosed method is suitable for video-on-demand services and not to broadcasting/real time TV-distribution/in real time.