1. Field of the Invention
This invention relates to sound signal processing and reproduction, specifically to reproduction of a sound image using 3 or more loudspeakers, spaced apart and placed forward of the listener, to independently produce sounds separated from a stereo (2-channel) source according to the relative locations of the sound sources in the stereo mix.
2. Description of the Prior Art.
I am not aware of any patents in the field of sonic separation into more than 3 forward channels. The more broadly related fields of stereo imaging, triphonic, quadraphonic, and surround sound are therefore reviewed. FIGS. 1A through 1D illustrate the relative loudspeaker and listener locations used with such sound reproduction systems. In these Figures, the names of inventors mentioned herein with respect to such systems ar found on the associated diagrams.
Since the beginning of sound reproduction, inventors and engineers have attempted to make reproduced sound as similar as possible to its original source sound. Continued improvements in the state of the art have come about in many areas. Various types of distortion have been reduced. Frequency response has been made both broader and flatter. Unwanted noise has been greatly reduced. Various signal recording systems have been developed, including records, tapes, and optical discs. Monophonic sound reproduction has advanced to where a single loudspeaker in an anechoic room can be made to sound almost indistinguishable from a single instrument or vocal sound source.
The reproduction of multiple sound sources, however, has been less successful. It was recognized early that 2 loudspeakers, each with its own signal, could create a better sound image than could a single loudspeaker. It was also shown by Clark, Dutton, and Vanderlyn in their article, "The `Stereosonic` Recording and Reproducing System," in the Jul.-Aug., 1957 issue of the IRE Transactions on Audio, that if sounds were properly recorded, and the listener properly located relative to the loudspeakers, then the location of the original sounds could be approximated by an apparent or virtual image between the loudspeakers within a limited frequency range. The preferred listener location is equidistant from both loudspeakers, at a distance greater than the distance between the speakers.
There has been a great deal of research done on human hearing, acoustics in general, and psychoacoustics in particular, to better understand how sound localization takes place. An example of this research applied to audio imaging is found in an article by Bauer titled "Phasor Analysis of Some Stereophonic Phenomena," published in the Nov., 1961 issue of The Journal of the Acoustical Society of America. Bauer and other inventors have used this research to improve and expand the virtual image. This image, however, is different from the true image. The difference is in the reproduced sound field. In a live music performance, the various sounds come from many different locations in front of the listener. The locations of these sound sources can be heard from any listener location. When music is recorded in stereo, the sounds from all sources are mixed into only 2 channels, left and right. This is done in such a way that sounds from the left are heard more loudly from the left loudspeaker and sounds from the right are heard more loudly from the right loudspeaker. Sounds from the middle are mixed more equally into both channels. Research has shown that at the correct listener location, the sound pressures at the ears of the listener can be made to approximate the corresponding pressures at a live performance, thus creating a good virtual image. Unfortunately, the stereo sound field approximates the live sound field only at that location. That is where the listener must be to hear the virtual image correctly.
Due to the phasor nature of the virtual image, it is also unstable with respect to both motion and attitude (direction) of the listener. That is, if the listener either moves from side to side or turns the head away from pointing directly forward, the virtual image will also move. This, of course is not true of the real image observed in a live performance. In fact, motion of the head is normally used by the brain to pinpoint the location of sound sources and distinguish them from their echoes in an echo rich environment.
Another disadvantage of 2-loudspeaker systems is that when loudspeakers are placed more than about 30 degrees apart, as viewed by the listener, the virtual image between them is weakened. The result is that if the loudspeakers are spaced far enough apart to include the breadth of live sound sources, such as an orchestra which may span 90 degrees, then there is a significant hole in the middle from which very little sound seems to come. Even sounds which are mixed equally into both left and right channels seem to come from the 2 separate loudspeakers thus spaced and not from between them.
For these reasons, stereo systems only image well when the listener is motionless, facing directly forward on the centerline between the loudspeakers, and at a sufficient distance from the loudspeakers. A further disadvantage of these constraints is that stereo systems do not fit well into most listening rooms. See FIG. 1A. To avoid early reflections from walls that will obscure the weak virtual image, both the loudspeakers 21 and 22 and the listener 20 must be placed away from the walls. This means that the loudspeakers and listener must be located near the middle of the room. In addition, to produce a good virtual image, the loudspeakers need to be at least 10 feet away from the listener and about half that distance apart. For best performance in a rectangular room 23 of normal proportions, the 2 loudspeakers must be located across a narrow end of the room, several feet from all walls, and the listener located at the other narrow end, several feet from the back wall. With these constraints, it is often impossible to achieve proper spacing. Movement through the listening room, which is often a living room, is made more difficult by the centrally located furniture. In general, then, the acoustical requirements for good stereo reproduction do not match the usual living requirements for the same room.
Various attempts at improving the stereo image have been made. Systems have been designed to reflect sound off walls to broaden and fill in the virtual image. Other systems that add phase shifted left and right signals to the opposite channels to cancel acoustic crosstalk at the listener's ears have been built and successfully marketed. Such systems often improve the image for the properly placed listener in the right acoustic environment, but are sometimes even more sensitive to listener placement than is regular stereo.
One more problem with stereo sound is that a great deal of th original ambient sound is obscured in the reproduction process. This seems to be a result of the weakness of the virtual image and its confinement to the region between the loudspeakers. Sound reflections from the listening room easily overpower the weak virtual image of reflected sounds from the original environment. A large and profitable industry has been built around devices to generate artificial ambience for both recording and reproduction of sound. These range from spring type reverberators to digital processing simulators of the measured echo environment of specific concert halls.
In spite of all its shortcomings, stereo (2-channel) recording has become the industry standard. Even with such sweeping changes in the audio industry as the development of compact discs and high speed digital signal processing, stereo recordings remain the standard.
Various advancements have been made in the area of quadraphonic sound. See FIG. 1C. The quadraphonic system uses 4 loudspeakers 30, 31, 32, and 33 arranged in a square around the listener 29 to create the illusion of the listener being completely surrounded b sound. The sounds thus reproduced seem to come from many directions. The effect of discrete quadraphonic sound can be a pleasant and startling one, but does not accurately represent what is heard at a live concert, where the sounds originate from the stage and orchestra pit in front of the listener.
In 1976, Willcocks disclosed, in U.S. Pat. No. 3,944,735, a system for decoding 4 sound channels recorded onto 2 channels using various types of encoding. His invention works well for quadraphonically encoded sources; but such recordings are rare since stereo recording is the standard. Stereo mixed recordings were never intended for reproduction through 4 loudspeakers surrounding a listener. Rather, the sounds thus mixed were intended to be heard from 2 loudspeakers located in front of the listener to simulate the location of the original performers. Herein Willcocks' and various other subsequent quadraphonic systems such as those disclosed by Cooper (1979) in U.S. Pat. No. 4,149,031, and Christensen (1982) in U.S. Pat. No. 4,316,058, all fall short. They may decode encoded signals, but they were never intended to separate sounds from stereo mixed recordings or improve the forward image.
Listener location requirements are more stringent for quadraphonic sound than for stereo. The listener must be equidistant from all 4 loudspeakers which must be at the 4 corners of a square. For this reason, quadraphonic systems have room fitting problems as great as those for stereo systems. Most listening rooms 34 are neither square nor large enough to provide sufficient spacing between the loudspeakers and the listener. With either quadraphonic or stereo sound, 2 people cannot enjoy the same sound image together because listener placement is so critical.
In 1978, Doi and Wakabayashi disclosed, in their U.S. Pat. No. 4,069,394, a device for improving the stereo image using only 2 loudspeakers. Their FIGS. 6 and 8 show circuits which could, if used properly, perform some functions similar to some of those of my invention. Their FIG. 6 is a circuit diagram of a pair of voltage dividers. Their FIG. 8 is a circuit diagram of two differential amplifiers connected in parallel. These simple circuits, however, are not unique to their design or to mine and can be found in many texts on basic electronics such as Walter G. Jung's "Audio IC Op-Amp Applications," first published in 1975 by Howard W. Sams & Company. My invention and many others make use of similar voltage differencing circuitry.
The stated object of Doi et al in using the circuitry of their FIGS. 6 and 8 is to produce from left and right inputs (L and R), outputs equivalent to L-.DELTA.R and R-.DELTA.L, where .DELTA. is a fraction of 1. They specifically state that for "The circuits shown in FIGS. 6 and 8 . . . the quality of the sound image provided thereby is the same as that provided by an ordinary 2-channel stereophonic system." The inadequacy of these embodiments results from Doi et al's failure to recognize and satisfy the conditions of optimality which I define hereinafter relative to my invention.
Other embodiments of Doi et al's invention are frequency dependent and employ both filters and phase compensation. These are required to compensate for frequency dependency of the virtual image as noted by Clark et al. Further, with the Doi et al invention, listener location is a critical as with regular stereo.
A similar system to that of Doi et al was disclosed in 1980 by Kogure et al in U.S. Pat. No. 4,219,696. Their device attempts to simulate the sound of a quadraphonic system using only 2 front loudspeakers. This would seem to have little value for music reproduction, since 2 front loudspeakers naturally produce a virtual image of a music performance that is as accurate as that of a quadraphonic system. Their invention does not attempt to separate mixed forward sounds by location. In addition, since only 2 loudspeakers are used to simulate 4, it is more sensitive to listener location than a similar quadraphonic system would be.
In 1985 Watanabe disclosed, in U.S. Pat. No. 4,524,451, a device for manually positioning single or multiple monophonic sound sources between many loudspeakers surrounding a listener. If all the original sound sources were available on separate channels, his device, if properly adjusted manually, would reproduce them very well. It does not, however, separate those sound sources out of 2 stereo channels once they are mixed.
Various surround sound systems have been developed and used primarily to improve the sound of movies. See FIG. 1D. Many movie sound tracks are encoded into left 37 and right 39 channels. Sounds to be heard from the screen are encoded by recording them in phase in both channels. Sounds intended to come from behind the audience are encoded by recording them out of phase in the left and right channels. Surround sound decoders create a synthesized center channel 38 by adding the left and right signals. The derived center channel places all in-phase sounds near the center of the screen. Rear or "surround" channels 36 and 40 are decoded by differencing the left and right signals
In 1986, Blackmer and Townsend disclosed, in their U.S. Pat. No. 4,589,129, a device for reproducing surround sound from encoded 2 channel recordings. Their system produces L, R, L+R, and L-R output signals, with various amplitude, phase, and frequency adjustments. It is very effective for movie sound tracks which have been encoded to simulate everyday sounds coming from all directions. But as with quadraphonic sound, surround sound does not accurately represent what is heard at a live music performance. Music is generally not intended to surround a listener 35, but to come from in front of the listener. Systems such as theirs, which use only whole combinations of left and right signals, lack the subtlety of imagery needed for accurate music reproduction. The surround sound listening room 41 must be rather large to provide sufficient distance between the loudspeakers and listener.
The result of all virtual image systems, whether stereo or quadraphonic is that they produce a rather poor forward image. This is the major difference between live and reproduced sound. See FIG. 1B. Several triphonic systems have been developed to improve the forward image by adding a synthesized center channel 26 similar to that used in surround sound systems. Adequate listening room 28 spacing is often possible with such a system because a true central image is less vulnerable to wall reflections than is a virtual image.
In 1986 Rosen disclosed, in his U.S. Pat. No. 4,594,730, a device for producing a center channel from the left and right stereo channels. His center channel is used to reproduce "direct" or monaural sounds, while the other 2 channels 25 and 27 reproduce "indirect" or ambient sounds. Such separation of "direct" and "indirect" sounds is accomplished by subtracting the signal generated for the center channel from both the left and right channels. Because the center channel is frequency band limited, however, the cancellation and resultant separation is not complete. Such frequency dependency is a very undesirable characteristic for a separation device. This is especially true for a center channel which is supposed to reproduce "direct" sounds. A listener 24 should hear the full spectrum of sound for each instrument or voice independent of its location. A greater problem with his approach is that, in fact, all original sound sources are "direct" and monaural, yet they come from many locations in a live performance, not just from the center. Even in a stereo recording, a monaural source can be recorded entirely in either the left or right channel. "Direct" does not mean directly in front.
Still another weakness of the Rosen invention is its use of variable resistances, so that the listener can control the image. His is the wrong approach if accurate sonic separation and sound field reproduction are the goals; because at a live performance, the image is not listener controlled.
Rosen also disclosed two 4-channel embodiments of his invention. In one of these, 2 loudspeakers are sent time delayed signals to enhance the ambient sound. This does nothing, however, to either separate the forward sounds or improve the image. The other 4-channel embodiment uses forward loudspeakers of which he states, "Acoustical center channel mixing is achieved when each individual channel of the 2 channel stereophonic source is fed to its own individual reproducer (therefore requiring at least 2 such reproducers) and when these reproducers are separated by a distance that is small when compared to the distance from the reproducers to a preferred listening location." His goal is clearly to emulate a 3 channel system by acoustical mixing, not to separate the sounds into more than 3 channels.
Latshaw, in his 1987 U.S. Pat. No. 4,685,136, disclosed an invention that uses 3 or 4 forward loudspeakers. He states that when 4 loudspeakers are used, "The first center speaker and the second center speaker are located at the center of the front of the room as closely together as practical, so that as a close approximation, the acoustical power of the speakers is perceived as coming from substantially the same location." Like Rosen, Latshaw's goal is to emulate a 3 channel system by acoustical mixing. This again is contrary to the concept of sonic separation in which the loudspeakers are spread out to avoid mixing sounds and to enhance separation.
Latshaw's device computes a time varying "commonality index" based on left and right time averaged signal envelopes. This is used to determine the mixture of left and right inputs in each of the output channels. Thus the image created by his device is both time varying and program dependent. His device also employs many directionality tests based on left and right signal envelope strength. These tests control switches in the signal processing path. Not only does his processing change with time due to the varying commonality index, but it changes discontinuously due to switching. The result is that sounds of lesser volume fail to hold their locations in the presence of louder sounds. That is, all sounds are erroneously steered in the direction of the loudest sounds. Even the louder sounds jump around as the various switching thresholds are crossed. In addition, the automatic balance feature of his design means that there is no true left or right locations, but all locations are relative to the momentary center or average between left and right. His invention is yet another example of a frequency dependent device which is not optimal for musical sound reproduction
In 1988, Tofte disclosed in his U.S. Pat. No. 4,747,142, another device for generating a center channel and modified left and right channels. His is the only device of which I am aware that purports to approach the sonic separation problem. Tofte says that his invention "could be likened to a reversal of the studio's mix-down process, where many separate microphone signals are `panned` onto a final master tape through a mixing console equipped with individual balance controls for changing the apparent position of each microphone in the stereo image." His device uses logarithmic compression and expansion. Between the compression and expansion, frequency band limited signals from the left and right channels are added together. In addition to the deleterious effects of filtering, the effect of this log-add-antilog process is that the output contains a product, instead of a sum, of left and right signals. This nonlinearity enhances separation, but greatly increases distortion of the thus separated sounds. In addition, the sonic balance between loud and soft sounds is upset in the process. This results in a serious loss of realism for the listener.
The work of Rosen, Latshaw, and Tofte shows that imaging improvements are possible using triphonic systems that remove part of a frequency band limited derived center channel from the left and right channels. Such systems work adequately for spoken voices, but fail to reproduce the full audio frequency spectrum from all channels. This limits their effectiveness for reproducing musical sounds.
Because loudspeakers must be spaced no more than 30 degrees apart to maintain proper imaging between them, at least 4 loudspeakers are required to cover the full 90 degrees of the forward image. If fewer than 4 are used, then the breadth of the image must be reduced or the quality of the image between the loudspeakers is compromised.
None of the prior art known to me and described above teaches the separation into more than 3 channels of forward sounds mixed in stereo. Those who have mentioned more than 3 forward channels (Rosen and Latshaw) have done so with regard to acoustical mixing of right and left channels to produce a middle channel, not with regard to a 4, 5, 6, or more channel separation of sounds.