This invention relates to audio-imaging systems and, in particular, to such systems which may be used with a stereo source to provide frontal imaging with or without ambient sound. The invention may also be applied to three or four channel sources in order to produce true full-direction imaging.
Systems with three recorded channels of audio, used to drive left, center, and right speakers, can produce a significantly superior, more realistic, or more accurate front-stereo image over a wide range of listening positions. While this format is typically utilized for film soundtracks, other recorded formats such as CD, cassette, radio, and video, utilize only two program channels, one intended for a left speaker and the other for a right speaker. This approach of recording just two channels for program distribution is established as a cost-effective compromise for distribution of stereo audio program material.
The standard sound reproduction implementation, which uses two speakers, left and right, to reproduce the program has serious limitations in imaging capability. In particular, it is necessary for the listener to be located in a specific area equidistant from the speakers in order to achieve the desired imaging performance. If the listener is off-center, then the sound of the nearer speaker dominates and distorts the stereo image. Existing audio-imaging schemes for a two-channel source do not effectively address this particular problem. There are other problems with existing two-speaker systems, such as the inferior phantom-source definition compared to direct-source definition, effects of room acoustics, and necessary close speaker matching, all of which can result in image distortion. The present invention is intended to address all of these problems using a third speaker, appropriate speaker placement, and an optimized matrix system used for deriving the outputs.
Systems have been designed to utilize three speakers to reproduce a stereo program source, but with various limitations. In one of the earlier published approaches, Klipsch, "Circuits for Three-Channel Stereophonic Playback Derived From Two Sound Tracks," IRE Transactions, November-December 1959, incorporates a left-plus-right sum signal, or for specially-made recordings, a difference signal, to drive a center speaker which is placed between the left and right speakers. In his system, the left and right speakers are driven directly with the left and right program signals. While the left-right sum signal provides viable means of creating an appropriate center channel, this approach compromises the left-right separation and degrades the overall imaging performance under most conditions.
Further developments published by B. B. Bauer, "Phasor Analysis of Some Stereophonic Phenomena," J. Acous. Soc. Am., Vol. 33, No. 11, November, 1961, p 1536, suggest that, to a large degree, two-channel stereo imaging can be explained using phasor analysis, treating each of the two acoustic sources as producing vector signals which can be added together resulting in a sum vector representing a phantom image which seems to arrive to the listener from a direction other than that of either speaker. This explains the effectiveness of an amplitude-panning potentiometer in placing a sound-source location between two speakers, as well as the ability to create sound-source locations outside the area between the speakers, using an inverted signal of appropriate amplitude on one channel. Bauer further described in "Some Techniques Toward Better Stereophonic Perspective," IEEE Trans. Audio, May-June 1963, a method of using phase shifting to spread a sound-source location to a wider angle, or range of location, than the theoretical point source which results from pure amplitude panning.
In Scheiber U.S. Pat. No. 3,746,792, four (typically) audio channels are encoded to two channels for recording or transmission and decoded upon playback to produce four output channels of audio information for driving four speakers. In this type of system, the speakers are placed peripherally around the listener with the intention of adding to traditional two-channel stereo, an ambient sound field which allows reproduction of the acoustic effects of the original room, or in some cases, special effects. In theory, a sound source, at any direction, may be localized by the listener. While this approach greatly expands the range of possible source locations, it does nothing to improve the front imaging; in fact it arguably degrades the localization capability of sounds intended to originate in front of the listener.
Another approach, as described by R. B. Lackey, J. W. Hull, and H. D. Colson, in "Three-Channel Audio Recording and Playback Via Two-Channel Transmission With Absolute Minimum Cross-Talk," Audio Engineering Society Preprint 1293, 58th Convention, November 4-7, 1977, derives three signals from two using a sum-difference linear matrix. This approach uses a fixed-matrix encoding and decoding process with the intent of providing depth, an added dimension, to the playback of a two-channel recording. The fixed matrix is suggested to be optimized for electrical separation using an encoding-and-decoding process, and the results are used to provide a rear-channel signal which can assist in creating a plane of sound enveloping the listener. This system does not improve the front-imaging capability of the audio system.
Another approach, as described in Ranga U.S. Pat. No. 4,132,859, uses an arrangement of speaker connection which can implement a linear matrix for driving four speakers in a surround-sound setup. This system allows proper decoding of the four-channel matrix with a special speaker connection. However it does not improve the front-imaging capability of the system.
Another triple-speaker approach to stereo imaging is described in Klayman U.S. Pat. No. 4,819,269, in which sum and difference signals are combined acoustically to create a strong center sound-source location surrounded by dual ambient-sound-field sources. This will create a mono source with ambience, but will be lacking in left-to-right separation or localization.
A theory of multi-dimensional decoding matrices has been proposed and analyzed by Michael A. Gerzon in "Optimum Reproduction Matrices for Multispeaker Stereo," J. Audio Engineering Society, July/August, 1992. An analysis aimed at preserving energy levels and incorporating intensity-localization theories results in frequency-dependent fixed matrix functions for a variety of multi-dimensional decoding schemes. The coefficients in this system are precisely and specifically set to satisfy a mathematical theory of energy preservation, and various psychoacoustic localization theories.
An example of an energy preservation system is described in Price U.S. Pat. No. 5,119,422. Price attempts to maintain the original mix, or balance, of various sound-source locations by maintaining an equal total energy in each speaker. Part of the goal is to maintain the original perspective, or balance of direct and ambient information. The energy preservation theory makes an assumption that the acoustic sources are uncorrelated and, thus, the amplitude of a center phantom image between two speakers is always 3 dB higher than one or the other speakers individually. This theory is based on a similar cosine-response theory used for pan controls which maintains the loudness of the resulting signal as a pan control is moved from left to right. In practice, it is rare that a recording engineer moves a pan control during a program production. In most music applications, sound-source locations are fixed. The loudness of a signal in the mix depends not on a constancy of total energy, but simply on what the recording engineer hears, since he will adjust the channel's fader to achieve the desired level.
In addition many recordings are made with center-image signals intentionally attenuated somewhat as a compromise intended to provide a degree of compatibility of the mix with monophonic playback systems. Hence, the goal of preserving equal total energy for phantom and real sources, based on a sum of individual channel energy levels, is not necessarily the ideal condition. This is especially true if the recording was made with a different number of monitor speakers than is used for playback. The energy preservation theory, further, does not adequately address discrepancies caused by the placement of speakers in the room, relative to the listening area. In particular, if the three front-imaging speakers are placed in an arc, or if the listener is relatively far from the speakers, namely further away from the speakers than the distance between the left and right speakers, then the center speaker needs to be at an equal or slightly higher level to balance correctly with the side speakers. In another situation, where the speakers are in a straight line, and the listening area is mostly closer to the speakers than the distance between the left and right speakers, then the center speaker needs to be at a slightly lower level than the side speakers to balance correctly with them.
None of the previous systems provides optimum matrix coefficients, or adjustment for them, to work effectively in providing both electrical separation and acoustic separation-enhancement, which are needed for good imaging, using typical existing stereo program material.