The field of the present invention is related to systems and methods for playing back media data and in particular to a master element for a distributed playback architecture, a slave element for the distributed playback architecture, a method for a synchronized playback in the distributed playback architecture and a computer readable digital storage medium usable for causing a processor to perform the method for synchronized playback.
Distributed playback enables the synchronized presentation and interactive control of linear multimedia content by multiple devices over a relatively low bandwidth network. The bandwidth of the network is considered to be relatively low when it would be insufficient for transmitting the entire amount of data that shall ultimately be presented at the various terminals. In other words, while the network bandwidth may well be sufficient for transmitting the multimedia content in a compressed format, it may not be sufficient for transmitting the multimedia content in an uncompressed format (e.g. data in a video interface standard such as DVI).
An example use case may be found in automotive industry front and rear-seat entertainment units, where each independently functioning unit is necessitated to simultaneously present the same media content, e.g. a DVD video disc inserted into a single device. For economical reasons, it is desirable to make use of an existing network infrastructure, rather than dedicated high-bandwidth equipment, cables, etc., necessitated for the transmission of audio and video signals to and from each unit, especially when the audio and video signals are provided in a “raw” format.
FIG. 1 demonstrates the basic concept of distributed playback. A number of devices are connected to a network 120. The device which owns or controls the media content being currently played back is the master 110. The media content is represented by a DVD 108 in FIG. 1. All other devices which simultaneously present the same content are slaves 112.
In the field of infotainment and/or entertainment units, especially for the automotive industry, the distribution of a video stream to the various video consumers within the vehicle has seen an increased interest from entertainment unit manufacturers and car manufacturers. Original equipment manufacturers (OEMs) start to ask for a separation between the video source and the video consumer(s) due to requirements for flexibility, network transparency and standardization, while avoiding extensive and proprietary wirings.
The research and development challenges imposed on the suppliers are considerable, as the questions to be solved includes those regarding a guaranteed bandwidth, low latencies, distributed A/V synchronization, block synchronization and compensation of differing travelling times, as well as frame-synchronous presentation on several displays.
A couple of technical requirements and boundary conditions may be taken into account for the architecture of distributed video solutions in vehicles. In FIG. 2, a typical system configuration is represented, comprising of a front console (typically acting as the master element 110) and a rear console (typically acting as the slave element 112). The system configuration further comprises a main amplifier 222 for outputting multi-channel audio and, as the case may be, a digital/analog TV receiver 250 and a rear camera 240. The digital/analog TV receiver 250 is connected to an antenna 252. Alt components are connected via an automotive suitable high speed ring bus structure 120 (e.g. Media Oriented Systems Transport (MOST), such as MOST150). The front unit 110 comprises a display 212 and user interface elements 214. The rear unit 112 is shown as being connected to two displays 232-1, 232-2 on which the video content may be displayed. Typically, the distance between the rear unit 112 and the displays 232-1, 232-2 is relatively short so that the rear unit 112 may send the uncompressed datastream to the displays via a suitable connection. The rear unit 112 may also be capable of rendering audio data and of providing the rendered audio data to the displays 232-1, 232-2. The displays 232-1, 232-2 may have integrated loudspeakers or, as depicted in FIG. 2, a jack for headphones 234. In the alternative to rendering the audio data via headphones 234, the audio data may be output by means of a loudspeaker 224 connected to the main amplifier 222 which may also be responsible for decoding the audio data.
The design of an infotainment system typically is subject to a number of paradigms such as network transparency, standardized networking, frame-synchronous decoding, and audio/video synchronization and latency.
Network transparency: the overall system, comprising various single components, is intended to offer an integrated infotainment offer. This means that the user shall be enabled to access, from his local console, all media in the system. A DVD in a disc drive of the front unit (“data source”) should behave in the same manner as a DVD in the rear unit. This means that the contents of the medium, including all network data such as title information, artist, genre or album are available at each console so that the user can access it and control the playback function, including so-called trick play commands (e.g. fast forward, slow motion, freeze frame, rewind, etc). Finally, the (decoded) audio and video contents need to be sent to the console of the user (“data consumer”). This property is known as network transparency and presents a core feature of ergonomic multi-access entertainment systems.
Standardized networking: all units of the overall system are mutually connected via a standard bus. Typically, a Media Orientated System Transport (MOST) is used as media bus with a high bandwidth in vehicles. In the alternative, an Ethernet-based network may be used in contrast to dedicated point-to-point video connections along with their corresponding connection points, protective screenings, driver components and input components, economies in terms of costs and space requirements, as well as an increase in flexibility regarding the application scenarios and the possible installation options may be realized.
Frame-synchronous decoding: even bus systems having a high bandwidth such as MOST150 with 150 Mbit/s, necessitate the transmission of compressed video formats (5 to 50 Mbit/s for high definition formats) rather than uncompressed video formats, as they also transport other signals such as multi-channel PCM audio. This typically necessitates a consumer-side decoding of e.g. MPEG-2 or MPEG-4 datastreams, while the data medium (DVD, USB device, or SD card) is located in the source. The ring topology of the MOST architecture allows to concurrently supply several consumers with the same signals so that a single data source can be presented in parallel to several consumers. As in the vehicle, in particular from the rear seats, several displays are often visible at the same time, the frame-synchronous decoding and presentation is desired.
Audio/video synchronization and latency: for lip-synchronous video presentation, the synchronization of audio and video streams is necessitated due to differing audio and video paths and decoder processing times. It has been found that most spectators do not mind audio lagging the video content up to approximately 120 ms. In the opposite case, however, when video lags behind the audio signal—it is typically annoying to the spectator for values between 5 and 10 ms and up. It would also be desirable to be able to synchronize the audio output to the presentation of video content at 2 or more display devices. Especially in connection with video content having frequent changes between brighter scenes and darker scenes, even a small temporal offset between the two or more display devices may be disturbing to the spectators.
Some of the above mentioned paradigms may contradict each other. For example, audio/video synchronization typically becomes more difficult if the network exhibits strongly varying network latencies for the various clients. Limited network bandwidth may result in the need to perform consumer-side decoding of video formats, as already mentioned above in the paragraph about frame-synchronous decoding. However, consumer-side decoding necessitates time and the necessitated time may again vary from client to client and over time, for example in dependence of the current workload at the client. Buffers may be introduced at the server and/or the clients for providing a capability to synchronize the playback at the server and at least one client. However, the introduction of buffers make the playback architecture sluggish which means that it may take a considerable amount of time between an instant at which a user enters a command and the instant of execution.