1. Field of the Invention
The invention relates to a method, a media source, a media sink and a media processing system to enable a synchronous play-out of media data packets.
2. Description of the Related Art
A human being uses two parameters of sound to determine the position of the sound source: the amplitude and the phase of the sound. Since the intensity of the sound decreases as it travels through air, the ear further away from the sound source receives a lower sound level than the ear close to the sound source. Further, because sound needs time to travel through air, the ear further away receives the signal later than the closer ear. Experiments have shown that human beings perceive a phase difference between the two channels of larger than 6-20 micro seconds (μs) as a displacement of the sound source and two signals with a phase difference of more than 35-40 milliseconds (ms) are perceived as two distinct sounds.
For audio systems that play-out (emit) audio sound this means that an audio signal belonging to one channel of a multi-channel signal, e.g. a stereo signal, should be played at exactly the same time, i.e. exactly the same moment in time, as all other corresponding audio signals belonging to the same multi-channel signal, e.g. the same stereo signal. In other words, a tight synchronization of different audio output devices, e.g. loudspeakers, is necessary so that the time relation between different channels of a multi-channel signal is met during the output. Similar requirements may also occur in other audio applications like e.g. Dolby Surround Systems or in audio-video applications.
The mentioned tight synchronization must also be fulfilled by digital transmission audio systems, where audio signals are transmitted from the media source to the audio output devices (in the following also more generally referred to as media sinks which include also devices to process a received multi-channel signal in any other way) in form of media data packets (in the following also referred to as media packets). Each audio output device must play-out the sound of a media data packet (play-out the media data packet) at exactly the right time, i.e. at the moment another media output device plays out a corresponding media data packet, e.g. belonging to the same stereo signal, but to another channel. If the media data packets are not played-out well synchronized, i.e. corresponding media data packets of different channels belonging to the same stereo signal are played-out at different times in different media output devices, the above mentioned problems occur, i.e. the stereo sound is eventually perceived as coming from another direction or eventually even two distinct sounds are perceived (these problems are in the following referred to as hearing distortions).
The Internet Engineering Task Force (IETF) has provided a Transport Protocol for Real-Time Applications (RTP) in its Request for Comments RFC 1889, in the following referred to as RTP. The Real-Time Transport Protocol (RTP) includes a control protocol RTCP which provides synchronisation information from data senders and feedback information from data receivers. Regarding the synchronization of streams for media distribution, this protocol provides so-called Sender Reports (SR) which provide a correlation between a sampling clock and a global clock.
The Sender Reports (SR) are sent from the media source to the media sink(s) and contain two timestamps. One timestamp indicates a moment in time in time units of the local sampling clock (local sampling clock time) and the other indicates the same moment in time in time units of the global clock (global clock time). Both timestamps of the SR are created at the same moment. The assumption is made that the global clock time is available to the media source and the media sink(s) between which the media stream is transmitted. A media sink thus has access to the global clock time and can therefore adjust its sampling clock to the global clock.
The main intention of RTP is to provide means for video conferencing in the Internet and to re-synchronize video and audio that is received in separate streams on the same single media sink. The protocol is not intended to ensure the synchronous play-out of media data packets in separate media sinks of a digital transmission audio system. Therefore, when using this protocol for sending out media data packets to media sinks, the media data packets may not be played-out well synchronized in different media sinks, i.e. media data packets belonging to the same stereo signal may not be played-out at the same moment in different media sinks, e.g. loudspeakers. Thus, the above mentioned hearing distortions may occur when using only RTP for digital transmission audio systems.
The problem of hearing distortions may also result from unreliable and unprecise clock information present in most non real-time source devices like personal computers (PCs) or personal digital assistants (PDAs). These devices assume that the global clock information (global clock time) meets all requirements set by the application scenarios. However, this may not be the case. A non real-time device usually gets an actual time (global clock time) for creating timestamps for media data packets via an external connection, e.g. USB or RS232. Because the bus systems that are generally used for this kind of external connection are not designed to allow a transport with very small guaranteed delivery times, the clock information (global clock time) may loose its accuracy when it is used by the PC or PDA, e.g. to determine a timestamp for a media data packet. This means the global clock time indicated by a timestamp of a media data packet may be wrong with respect to the actual global clock time at which the media data packet is actually sent out. Further, the time difference between two times indicated by two timestamps may vary, even though the time difference between the two corresponding actual global clock times do not vary. The reason for this may be that the time required by the external connection to transport the global clock information to the application may vary. Since the timestamps of the media data packets are generally used by the media sinks to determine a play-out time for each packet, the inaccurate and statistically varying time indicated by the timestamps of the media data packets may lead to the mentioned hearing distortions, since media data packets belonging to the same stereo signal may be played-out at different times by the different media sinks.