Rich media applications become more and more popular as users enjoy active roles in the applications. With the convergence of TV network and Internet, this trend will move from computer to TV screen. Rich media applications enjoy much credit for its superiority in the aspect of user interaction: viewers are allowed not only to watch multiple types of collaborative media content such as video, audio, picture, animation, text, etc. simultaneously, but also to freely switch or transfer media flow among multiple devices. For example, a user can obtain an advertisement video about a car, wherein the car is being presented in a movie; a user can get into a virtual world of a museum in parallel with a video of that museum. In this vision of future TV programs, a single display device is not able to provide enough display space for several simultaneous media renderings. A common practice is to divide TV screen into multiple rendering spaces or simply switch between multiple media renderings. Traditionally, a rich media application is executed by a rich media player on a single device, for example, a flash player on a TV set-top-box (STB)/Tablet/or other type of terminals. When a user interacts with one media content, the rich media player can interpret the interaction event, and make a response on another media based on a rule defined in the rich media format. If two or more pieces of media content are rendered on a single device, it is easier to synchronize them. Another synchronization method for a single device is to use SMIL (Synchronized Multimedia Integration Language), which is deployed widely in mobile multimedia messages. SMIL allows integrating a set of independent multimedia objects into a synchronized multimedia presentation. Using SMIL, an author can 1) describe the temporal behavior of the presentation; 2) describe the layout of the presentation on a screen; 3) associate hyperlinks with media objects. However, the above two methods do not provide a solution for synchronizing media flows of multiple collaborative media content inside a rich media set over multiple display devices.
Several conventional methods exist for media synchronization over multiple devices.
The first one is global timing synchronization by hardware clock system or network time protocol (NTP). NTP provides Coordinated Universal Time (UTC). NTPv4 can usually maintain time within 10 milliseconds ( 1/100s) over the public Internet, and can achieve accuracies of 200 microseconds ( 1/5000s) or better in local area networks under ideal conditions. Although NTP protocol can guarantee accurate physical layer synchronization, it cannot reflect the synchronization requirement of the media playing in application layer. In order to map the media playing time line with the physical time line, the terminal needs to check the system clock frequently, which will add the overhead and complexity of software implementation in the terminal.
The second method is deployed for quality of service (QoS) guarantee, such as bandwidth guarantee or delay guarantee. When a viewer is watching multiple display devices simultaneously, a certain amount of time delay is tolerable; however, the delay should be guaranteed and predictable by using media transmission QoS control protocol, such as RTCP (RTP Control Protocol) in conjunction with the RTP (Real-time Transport protocol). RTP carries the media streams (e.g., audio and video), and RTCP is used to monitor transmission statistics and QoS information. In RTCP protocol, report packets are sent periodically to report transmission and reception statistics for all RTP packets sent during a time interval. This type of protocol can guarantee the synchronization at packet level from one sender to multiple receivers, but it still cannot guarantee the playback synchronization finally presented to the viewer because of the difference of packet processing, buffer control, audio/video decoding and player rendering in multiple display devices. The final subjective impression about synchronization from viewer's point of view totally depends on the media frames displayed on screens of display devices.
Therefore, a method for synchronized content playback at the level of content presentation on display devices is required in order not to give viewer impression of non-synchronization over multiple display devices.