1. Field of the Invention
This invention relates to spatially extending a sound stage beyond the positions of two loudspeakers for enhanced enjoyment of two-channel stereo recordings.
2. Description of the Related Art
The music that has been recorded over the last four decades is almost exclusively made in the two-channel stereo format which consists of two independent tracks, one for a left channel L and another for a right channel R. The two tracks are intended for playback over two loudspeakers, and they are mixed to provide a desired spatial impression to a listener positioned centrally in front of two loudspeakers that ideally span 60 degrees (i.e. relative to the vantage point of the listener, the loudspeakers are at angles of +/−30 degrees). A limited spatial impression can also be experienced from other listening positions. The two-channel stereo format is also used for the final delivery of many other types of entertainment audio, such as MPEG-2 digital television broadcasts with multiple digital sound channels, digital versatile discs (DVDs), videotapes, CD's, audiocassettes, and video games.
In many situations, it is advantageous to be able to modify the inputs to the two loudspeakers in such a way that the listener perceives the sound stage as extending beyond the positions of the loudspeakers at both sides. This is particularly useful when a listener wants to play back a stereo recording over two loudspeakers that are positioned quite close to each other. The loudspeakers contained in a stereo television, for example, or positioned on either side of a computer monitor usually span significantly less than the recommended 60 degrees. Nevertheless, a widening of the sound stage is generally perceived as a pleasant effect regardless of the position of the loudspeakers, and many stereo widening schemes have been developed for this task over the years.
It is well known that when the polarity of one of the two loudspeakers in a conventional stereo setup is reversed, the sound stage becomes blurred in a way which is generally perceived to be undesirable. Nevertheless, this phenomenon demonstrates that it is possible to achieve a spatial effect simply by feeding the two loudspeakers with two coherent signals that are out of phase. It can be shown that at very low frequencies the signals fed to the two loudspeakers must be almost exactly out of phase in order to make the sound stage extend beyond the loudspeakers [Kirkeby et al., Virtual Source Imaging using the Stereo Dipole, the 103rd Convention of the Audio Engineering Society in New York, Sep. 26-29, 1997, AES preprint no. 4574-J10].
A stereo widening processing scheme generally works by introducing cross-talk from the left input to the right loudspeaker, and from the right input to the left loudspeaker. The audio signal transmitted along direct paths from the left input to the left loudspeaker and from the right input to the right loudspeaker are usually also modified before being output from the left and right loudspeakers.
As described in U.S. Pat. Nos. 4,748,669 and 5,412,731, sum-difference processors can be used as a stereo widening processing scheme mainly by boosting a part of the difference signal, L minus R, in order to make the extreme left and right part of the sound stage appear more prominent. Consequently, sum-difference processors do not provide high spatial fidelity since they tend to weaken the center image considerably. They are very easy to implement, however, since they do not rely on accurate frequency selectivity. Some simple sum-difference processors can even be implemented with analogue electronics without the need for digital signal processing.
Another type of stereo widening processing scheme is an inversion-based implementation, which generally comes in two disguises: cross-talk cancellation networks and virtual source imaging systems. A good cross-talk cancellation system can make a listener hear sound in one ear while there is silence at the other ear whereas a good virtual source imaging system can make a listener hear a sound coming from a position somewhere in space at a certain distance away from the listener. Both types of systems essentially work by reproducing the right sound pressures at the listener's ears, and in order to be able to control the sound pressures at the listener's ears it is necessary to know the effect of the presence of a human listener on the incoming sound waves. U.S. Pat. No. 3,236,949 discloses the inversion-based implementations by designing a simple cross-talk cancellation network based on a free-field model in which there are no appreciable effects on sound propagation from obstacles, boundaries, or reflecting surfaces. Later implementations use sophisticated digital filter design methods that can also compensate for the influence of the listener's head, torso and pinna (outer ear) on the incoming sound waves. See e.g. U.S. Pat. Nos. 4,975,954, 5,666,425, 5,727,066, 5,862,227, 5,917,916.
As an alternative to the rigorous filter design techniques that are usually required for an inversion-based implementation, U.S. Pat. No. 5,046,097 derives a suitable set of filters from experiments and empirical knowledge. This implementation is therefore based on tables whose contents are the result of listening tests.
It is common to all the implementations mentioned above that they process a substantial part of the audio frequency range. U.S. Pat. No. 4,975,954 restricts the processing to affect only frequencies below 10 kHz, Gardner suggests the processing cut-off to be at 6 kHz [W. G. Gardner, 3-D Audio Using Loudspeakers, Kluwer Academic Publishers, 1998, pp. 68-78], and it is mentioned that the techniques described in U.S. Pat. No. 5,046,097 still work even if the processing is restricted to affect frequencies between 200 Hz and 7 kHz only. Ward and Elko [S. L. Gay and J. Benesty (Editors), Acoustic Signal Processing for Telecommunication, pp. 313-317 of Chapter 14, Kluwer Academic Publishers, 2000] suggests splitting up the processing into four different frequency bands: low (<500 Hz), low-mid (500 Hz<f<1.5 kHz), high-mid (1.5 kHz<f<5 kHz), and high (>5 kHz). Only mid frequencies are processed (500 Hz <f<5 kHz) but it is necessary to use four loudspeakers for the reproduction, two closely spaced (±7 degrees recommended) and two widely spaced (±30 degrees recommended).
The widening of the sound stage usually comes at a price. It is difficult to achieve a convincing spatial effect without introducing spectral coloration (i.e. certain parts of sound spectrum become more emphasized versus other parts of the sound spectrum) of the original recording. Reflections from the acoustic environment, such as the walls and furniture in an ordinary living room, tend to make this undesirable spectral coloration effect even more noticeable. Consequently, a stereo widening processing scheme often degrades the quality of the original recording, particularly at positions away from the “sweet spot” (the optimal listening position for which the stereo widening scheme is designed). At non-ideal listening positions, which may be only a matter of centimeters away from the sweet spot, the processing provides the listener with little or no spatial effect but the spectral coloration is noticeable in all of these non-ideal listening positions. Ideally though, a listener who is not in the sweet spot should not be able to tell whether the processing is “on” or “off”. It would therefore be advantageous to have a transparent stereo widening algorithm for loudspeakers that maximizes the spatial effect for a listener sitting in the sweet spot while preserving the quality of the original recording.