The Haas Effect, also known as the precedence effect, describes the human psycho-acoustic phenomenon of correctly identifying the direction of a sound source heard in both ears but arriving at different times. Due to the head's geometry the direct sound from any source first enters the ear closest to the source, then the ear farthest away. The Haas Effect tells us that humans localize a sound source based upon the first arriving sound, if the subsequent arrivals are within 25-35 milliseconds. If the later arrivals are longer than this, then two distinct sounds are heard.
The Haas effect can produce an apparent increase in the volume from one of a pair of stereo speakers. In a high quality Hi-Fi system, this effect can be perceived at time differences of only 1-2 mS between the two stereo channels, becoming increasingly noticeable up to 25 mS or so. Greater time lags will be perceived as echoes and reverberations. The necessity to synchronize time between networked multi-channel speakers should therefore be clear.
US patent application 2003/0200001 describes a method for synchronizing the playback of a digital audio broadcast on a plurality of network output devices using a microphone near a source, embedded control codes, and the audio patterns from the network output devices. An optional, additional manual adjustment method relies on a graphical user interface for adjustment and audible pulses from the devices which are to be synchronized. Synchronization of the audio is accomplished with clock synchronization of the network output devices.
US patent application 20040059446 to Goldberg et al discloses synchronizing remote audio devices that are coupled together with a bus. The synchronization mechanism compares a signal on the bus with a clock signal on the audio device and adjusts the clock in response to the comparison. This allows the synchronization mechanism to accurately synchronize remote audio devices without requiring high precision clocks or other complicated solutions. The synchronization technique is particularly applicable to synchronizing remote audio devices in a distributed audio system that digitally samples and broadcasts for communication purposes. In Goldberg, the synchronization mechanism improves audio quality by synchronizing the sampling and outputting of each audio device on the bus. This improves audio quality by reducing the distortion that occurs as a result of varying sample times. This provides much more precise audio synchronization but requires a wired bus. Thus it is not suitable for implementation between two wireless networked speakers.
In the home networking market, a major catalyst is the recent emergence of 802.11 WLAN technology for wireless home networking. The cost of 802.11g access points is rapidly falling below 100 Euro which will further drive the market for networked CE products as consumers begin to perceive the benefits and simplicity of these new wireless networking technologies.
In FIG. 1 we show the home networking environment [101] that next-generation CE appliances [102, 104] will “live” in. Basically a local network of CE appliances will interoperate over wired islands [103] which are glued together by a global wireless 802.11 network [105]. This local network is connected, in turn, via a gateway appliance [108] to an external wide area network (WAN) [106], effectively the broadband connection to the home.
Accordingly, consumers will expect state-of-art home CE devices to be wireless networked enabled so they can install new devices and peripheral subsystems without a requirement for wires or cables.
Wireless speakers are already known in the market but these are not true networked subsystems, but typically carry an audio signal from a master unit over a wired point-to-point 25 connection. Thus, synchronization of the audio can be achieved on the master unit.
US patent application no. 2003/0210796 to McCarty et al discloses communicating audio signals between an input device and an output device via a network. The output device can include loudspeakers and headphones. In some embodiments an output device, for example a center channel speaker, transmits audio signals to other output devices. In some embodiments, the output device is coupled to, or combined with, a speaker stand or speaker bracket. The network can be wireless, wired, infrared, RF, and powerline. However McCarty et al do not explicitly disclose achieving high quality audio synchronization between stereo speakers. (They do mention that manual synchronization may be achieved by adjusting phase or delay between channels; however this cannot compensate for very fine time delays between channels or for ongoing drift between the two channels.)
A related field of prior art is that of determining the clock skew of an audio subsystem. Examples include “A New Audio Skew Detection and Correction Algorithm” to Akester et al, presented at ICME 2002 which teaches accurately estimating the clock skew based on time-stamped RTP packets and a determination of the audio playback rate from system interrupts. U.S. Pat. No. 6,327,274 to Ravikanth discloses determining relative clock skews between two devices connected to a packet based data network.
It is evident from the prior art that each device may determine a local clock skew rate and may modify the received multimedia stream accordingly. However, in such cases, the client device must:
(i) perform real-time interrupt-driven analysis in order to measure real-time clock skew;
(ii) at the same time it must decode the digital audio and may also need to de-interleave the received audio stream; and finally,
(iii) in order to increase, or decrease the size of a received audio data packet to compensate for the determined clock skew the client device should perform an analysis of each received audio packet in order to determine where, and how, to insert (or remove) audio samples.
All of these methods must be further implemented in real-time placing a high computational load on the client device.
Step (i) above can be realized using techniques such as those described U.S. Pat. No. 6,327,274 to Ravikanth where the relative clock skew between the server originating the data stream and each client may be determined, on the client, from the timestamp data on each RTP packet and a statistical analysis of the jitter and delay of a plurality of packets from the received data stream. However, this still places significant real-time computational demands on the client device.