This invention relates generally to sound reproduction systems and, more specifically, to the enhancement of multichannel sound reproduction through improved speaker arrangement and the relation of this arrangement to audio signal processors and their algorithms.
A number of systems have been proposed for expanding the stereo image present in stereo source material. These systems employ a number of techniques and algorithms to expand the stereo image beyond the confines of the left and right speakers. Such systems have also been adapted to source material with more than two independent input channels, and for use with more than two speakers. These find application in computer sound playback, home and car audio systems, and many other applications based on material from any of the many computer storage systems, video and audio cassettes, compact discs, FM broadcasts, and all other available stereo and multichannel media.
The generic stereo or two output channel arrangement of the prior art is shown in FIG. 1. A listener 10 is positioned some distance D away from the midpoint between a pair of speakers 13 and 14. This midpoint is taken as the origin of the reference coordinates (x,y), with the X-axis extending as shown toward the primary listening area. In a general placement, each of the speakers, 13 and 14, will be different distance from the listener 10 and, in particular, a different distance from each of the listener's ears 11 and 12. The signals to the right speaker 14 and the left speaker 13 are supplied from an audio signal processor 17 along lines 16 and 15, respectively. The signal processor produces the output signals along 15 and 16 based upon the audio signals input from lines 18. In the case of a 2 input, 2 output, or 2-2, signal processor, there are only two input lines 18.
In the simplest case, the signal processor is absent and a pair of input lines 18 from a stereo audio source are then the same as lines 15 and 16 and there is no enhancement of the stereo signals. When a signal is transmitted from a single speaker, say the right speaker 14, the listener identifies the location of the speaker as (xr,yr) based on the difference between what is perceived at the right ear 12 and what is perceived at the left ear 11. This difference in perception is due, firstly, to the difference in path lengths between the right speaker and the right ear, drr, and between the right speaker and the left ear, drl, and to a difference in audio level. This difference produces a corresponding delay in the signal at the left ear as it must propagate the additional distance Δdr=drl−drr. But there are also additional effects: These arise as the head of the listener 10 is not acoustically transparent to the sound waves and will alter them as they propagate around the head to the left ear 11. This filtering effect is described in terms of Head Related Transfer Functions (HRTFs). This combination of signal delay and alteration as perceived by the listener contribute to how the source of the sound is identified as being at the point (xr,yr).
To produce a sound that the listener will perceive as being located at an arbitrary point (x,y), a speaker 19 would ideally, but impractically, be placed at each such position (x,y). To produce the sounds across the entire front field of the listener, such as is desired for home theater, computer games, or many other uses, would therefore require a vast number of speakers and a corresponding number of independent signals for this surround sound or multichannel effect. To mimic this effect, the psycho-acoustical mechanisms that allow the listener to fix the location of a sound source can be exploited through delay and HRTFs.
A number of different algorithms exist for this purpose and are widely know in the art. Examples and sources include Dolby Laboratories, Q-Sound Corporation, Spatializer Corporation, Aureal Semiconductor, Harman International, and SRS True Surround. These would then be employed inside the signal processor 17 to produce output signals on lines 15 and 16. There may be more than two inputs signals, for instance in the case of 5.1 home theater system which employ left, right, and center forward channels as well as left and right surround channels. These algorithms rely upon encoding/decoding schemes to create a spatial representation of recorded materials, allowing them to place the sound at the perceived location (x,y) of a virtual speaker 19 without requiring a physical speaker at this location.
These signal processing algorithms employ delay, HRTFs, inter-aural crosstalk cancellation, and other methods known in the field of binaural hearing using two speakers. A generic example of such a prior art signal processor is shown in FIG. 2 as a block diagram for the case of two input signals 18. For a signal L entering the left input channel of 17, this signal is also supplied to the right output channel at the adder 28 after going through the inverter 22 and having its amplitude diminished and delayed by block 25. By including this out of phase, delayed, and diminished version of the signal L in the right output signal R′ and transmitting it to the right speaker in addition to supplying the signal L to the left speaker, the perceived source of the sound is de-localized from the left speaker. A similar process, based on inverter 21 and block 24, produces a signal from the right input R that adder 27 combines to L to form output signal L′ that de-localizes signals from the right channel. By further incorporating HRTFs into blocks 24 and 25, along with similar processing in the blocks 23 and 26, it possible to simulate the psycho-acoustic stimuli of multichannel or surround stereo with only a pair of speakers. Additionally, by a proper construction of HRTFs, variations in the vertical position, a suppressed z direction in FIG. 1, may also be mimicked.
Although these algorithms as embodied in a signal processing circuit can be effective in enhancing stereo reproduction to produce virtual multichannel or surround sound, there are a number of shortcomings. A primary one of these is inherent in the algorithms themselves: To produce the output signals L′, R′ from the input signals L, R requires a number of assumptions to be made about both the location of the speakers 13 and 14 as well as the actual speakers themselves. For the various processing blocks 23, 24, 25, and 26 to provide the correct delays, HRTFs, and so on requires the algorithm to assume a particular speaker separation and alignment modeled on point-like speakers. It must also make a series of assumptions about speaker response, particularly about the differential response of one speaker relative to the other.
As these assumptions are built into the signal processor, it is important that the speakers are spaced correctly and, preferable, slightly above the listener: For the proper psycho-acoustical response, the physical speaker separation is more important than the Y location of the listener, with the listener's X position even less critical. Users frequently place speakers in an arbitrary manner for any number of practical or aesthetic reasons, because the size or purpose of the correct physical separation is not known, or based on the incorrect assumption that a wider physical separation produces a better result. Additionally, for some computer monitors and other uses, the speakers are often fixed, but in a position that may be incorrect as the algorithm used may have been based on the speaker position of, say, a car. These defects undermine the algorithm at the core of the signal processor and are a serious limitation in the prior art.
The alignment, or azimuthal angle, or the speaker axis also affects the sound received by the listener. The above example of speaker placement in a car compared to that in a home computer system is also illustrative of this problem: Car speakers are often placed in the doors of the automobile where the sound will come from the listener's sides, while personal computer applications usually place the speaker to the front of the listener. Aside from any change in relative delay of amplitude this may cause, these two placements will require different HRTFs as the sound will propagate around the listener on a different path. Even with the alignment of the application for which the algorithm was designed, aligning one speaker askew to the other speaker will create another differential response that will undermine the algorithm.
The assumptions about the speakers themselves include idealizing them as having the same response to a given input signal. Whether through using improperly matched speakers, differences in how they are connected, or even manufacturing variations, actual speaker pairs will, to degree or another, have relative variations. Such variations will not only degrade the enhanced stereo algorithms described above, but also more “traditional” or non-enhanced stereo reproduction. Some of the more basic differences resulting from differences in things such as speaker or enclosure compliance can be addressed by balance controls or graphic equalizers, but these are not concerned with the sort of dynamic signal processing, related to phase or other such parameters, such as is used for virtual speaker placement.
One method known in the art for improving such enhanced stereo schemes is to employ one of the matrix encoding-decoding processes known in the literature for creating a spatial representation of recorded material, examples including ProLogic, Circle Surround, and Logic 7. Such schemes are dependent on special source material encoding. Generically, these processes start with n distinct sound channels that are matrix encoded into l channels for an n:l encoding. At the reproduction stage, these l channels are then subjected to l:m matrix decoding to produce m output signals. Aside from other shortcoming, these algorithms still suffer from the need for proper speaker placement, but now have the additional complication that the signal processor must be able to handle the proper decoding scheme, which may or may not be compatible with other input material for the processor.
One way to overcome some of these limitations is, of course, to introduce more independent sound channels and the corresponding speakers, as is done for instance in the Dolby Digital, Sony SDS, or DTS 5.1 channel cinema sound recording or Direct X computer game sound. All of these examples employ a pair of rear channels to provide stereo sound from the back. Although this may improve sound from the rear to produce a more realistic representation, it still leaves the previous limitations for the more important front sound channels. Additionally, although the psycho-acoustic localization of sound from the rear is less acute than from the front, the inclusion of rear speakers now introduces all of the speaker placement problems inherent in enhanced stereo algorithms to rear speakers as well as the front, though less critically so.
Similarly, such multichannel or matrix sound system would benefit from an increase in the number of actual speakers, although a method would be needed to produce the signals suitable for these extra speakers. Once again, proper placement of these speakers is needed for the best results.
Therefore, one objective of the present invention is to reduce these limitations by presenting an audio signal processor responsive to information on speaker placement and response. A second objective of the present invention is to reduce these limitations in such a manner as to not require intentional pre-encoding of the source material and is, therefore, of immediate use and applicability to current stereo recordings. Such improvements would also have applicability for producing virtual multichannel enhanced stereo as well as for non-enhanced, conventional multichannel sound.
Other objectives are to present a speaker mechanism that holds the speakers in a set spatial relationship, either fixed or adjustable to each other and including a sensor mechanism to provide data about this relationship and other relative speaker information. A further objective is to use this information to effect variation in the algorithm employed by the audio signal processor.
An additional objective of the present invention is to extend these other objectives beyond two channel stereo to matrix or multichannel audio systems by extending the same techniques to rear sound channels, and, furthermore, by such an application to produce a virtual rear center channel when only a left and right rear channel signal are provided.
A further object is to use such algorithms to provide audio signals to an even greater number of speaker pairs to flood an enclosed listening space with sounds from a greater number of directions.