Human listeners are readily able to estimate the direction and range of a sound source. This ability is remarkable in many respects. A human being has only two ears, and is thus apparently sensing with only two degrees of freedom. To locate a sound in three-dimensional space requires three degrees of freedom, for example azimuth angle, altitude angle, and range. In translating from two to three degrees of freedom we would expect on theoretical grounds that ambiguities would commonly arise, but such ambiguities are rarely experienced. When multiple sound sources are distributed in space around the listener, the position of each may be perceived independently and simultaneously. This is true even when the sources are of a generally similar nature, as for example in a crowd of people all speaking at once, at a cocktail party. Despite substantial and continuing research work over many years, no satisfactory theory has yet been developed to account for all of the perceptual abilities of the average listener.
A process which measures the pressure or velocity of a sound wave at a single point, and reproduces that sound effectively at a single point, preserves the intelligibility of speech and much of the identity (and pleasure) of music. Such a system removes all of the information needed to locate the sound in space; thus an orchestra, reproduced by such a system, is perceived as if all instruments were playing at the single point of reproduction. Early in the history of sound reproduction it became clear that such a system removed a substantial part of the pleasure of listening. Exercising the ability to perceive the location, as well as the nature, of a sound source is pleasurable to the listener.
Efforts were therefore directed to preserving the directional cues during transmission and reproduction. In the continuing lack of a satisfactory theory to elucidate the nature of such cues, these efforts were perforce empirical. It seemed reasonable to assume that, since sensing with two ears is vital to perception of sound location, two transmission channels should be provided. In U.S. Pat. No. 2,093,540, issued to Alan D. Blumlein in September 1937 (and filed in 1932), substantial detail for such a system is given. This landmark patent covers methods in use today for optical stereo soundtracks on motion picture film, stereo recording on phonograph discs, stereo microphone techniques, and stereo loudspeaker placement. The artificial emphasis of the difference between the stereo channels as a means of broadening the stereo image, which is the basis of many present stereo sound enhancement techniques, is described in detail. The basic acoustical relationships required to place a stereo sound image in coincidence with a visual image, across the lateral dimension of a motion picture film, are shown in considerable mathematical detail.
From the nineteen-thirties to the present day continual improvement and refinement has been applied to the basic stereo system exemplified in Blumlein's work. For example, in U.S. Pat. No. 4,118,599, issued to Makoto Iwahara et al in October 1978, great efforts are made to ensure that the sound pressures at the ears of a single listener, critically placed and oriented with respect to the loudspeakers, ". . . faithfully represent what a person actually located in the position of the microphone would hear . . ." (Col. 3 lines 4-6). Similarly in U.S. Pat. No. 4,524,451, issued to Koji Watanabe in June 1985, we see analysis founded on a similar concern; "If the front speakers are driven by signals which would produce the same sound pressures at the listener's ears as . . ." (Col. 6 lines 42-44). Such systems do not seem to have come into widespread use, despite their obvious potential for accuracy; possibly this is because the analysis on which they are based is critically dependent on the position, angle and dimensions of the listener's head.
It would appear that this concern for accurate, detailed reproduction of the spatial cues present when a real sound source is heard first emerged from work at the Bell Telephone Laboratories, as detailed in U.S. Pat. No. 3,236,949 issued to Bishnu Atal et al in February 1966. The goal is explicitly stated; "It is in accordance with the present invention to provide at the listener's left and right ears, the appropriate sound pressure waves which would reach his ears from such a source of sound 3, from the two fixed position loudspeakers 1 and 2." (Col. 3 lines 9-13). This has clearly been the goal of many later inventors.
A different line of improvement has sought to enhance or expand the scope of the perceived stereo image, which normally lies entirely along a line joining the centres of the loudspeakers. Typical of such approaches is the work described in U.S. Pat. No. 4,355,203 issued to Joel Cohen in October 1982. This patent describes elegant modern circuitry to emphasise the difference between the left and right stereo channels, ". . . for either increasing stereo separation or enhancing perimeter sound images, or both . . ." (Col. 1 lines 14-15). Similarly, U.S. Pat. No. 4,748,669 issued to Arnold Klayman in May 1988 describes elaborate "sum and difference" signal processing circuitry which ". . . is particularly directed to a stereo enhancement system which broadens the stereo image, and provides for an increased stereo listening area . . ." (Col. 1 lines 11-13).
Several patents have been issued covering inexpensive circuitry to expand the somewhat confined stereo image created within an automobile; typical are U.S. Pat. Nos. 4,394,536 and 4,394,537 to Kenji Shima et al in July 1983, 4,329,544 to Akitoshi Yamada in May 1982 and 4,349,698 to Makoto Iwahara in September 1982. All of these patents rely on cross-coupling the stereo channels in one way or another, to emphasise the existing cues to spatial location contained in a stereo recording.
These enhancing or broadening circuits are usually more empirically based than the precision reproduction circuits. Demands on the listening configuration are relaxed. Particularly in the case of automobile installations, where the faults caused by the environment are major and the listening conditions are less critical, they have enjoyed greater popularity. Pushing such techniques perhaps to the limit, U.S. Pat. No. 3,560,656 issued to Roswell Gilbert in February 1971 shows ingenious circuitry for use with a monophonic input and stereo output in a dictating machine. The device ". . . created a sound output which gave a distinct impression of `breadth` and reality." (Col. 3 lines 48-49). Here the goal is clearly and frankly the provision of a pleasant experience, without thought for "accuracy".
Common to all these and many other "improvements" to the basic stereo sound system is an underlying dissatisfaction with its performance. The stereo sound image is at best limited and one-dimensional, confined to a line between the loudspeakers or small extensions of that line. Much of the pleasure and excitement of being amongst the sound sources is lost. At worst, the image breaks down entirely and the sound is merely perceived as emitted by two sources, the loudspeakers.
In attacking these problems, inventors have tried systems with four independent channels (Quadrophonic sound) or with a multiplicity of loudspeakers. U.S. Pat. No. 4,410,761, issued to Willi Schickedanz in October 1983, shows a scheme for a television set with eight loudspeakers fed from two independent channels.
An alternate approach has been to attempt to produce sound images free of the constraints of conventional stereophony. Some such systems eschew entirely the pursuit of a stable, realistic image. Hence U.S. Pat. No. 4,208,546, issued to Robert Laupman in June 1980, cites as an advantage that ". . . the auditor on the medium perpendicular will obtain a position impression, which means that he will experience a variable impression of the position of the instrument or singer. This increases the unreal character of the result achieved."
Tighter control of sound images is sought by Takuyo Kogure et al. in U.S. Pat. No. 4,219,696 of August 1980. They define the normal mathematics which would allow placement of sound image anywhere in the plane containing the two loudspeakers and the listener's head, using modified stereo replay equipment with two or four loudspeakers. The system relies on accurate characterisation, matching, and electrical compensation of the complex acoustic frequency response between the signal driving the loudspeaker and the sound pressure at each ear of the listener. Perhaps because this response will vary dramatically with small changes in the position, angle or dimensions of the listener's head, no practical applications of this patent appear to be in widespread use. There is considerable variation in the characteristics of loudspeakers, even when two apparently identical units, consecutively produced on a mass production assembly line, are measured. This variation would be adequate to interfere with the accuracy of a critical system such as Kogure describes, so individual tuning to match each loudspeaker might well be necessary.
Similarly, in U.S. Pat. No. 4,524,451 issued to Koji Watanabe in June 1985, precise characterisation and compensation of complex frequency responses is shown as a basis for the creation of "phantom sound sources" lateral to or behind the listener. In this case, the use of real sound sources to replace the "phantom" ones is also detailed; this is probably a more practical scheme.
A most interesting line of development has been pursued at Northwestern University, and is reported in U.S. Pat. No. 4,731,848 issued to Gary Kendall et al in March, 1988. In this work the entire reverberant environment of a listening room is carefully and accurately modelled. Each possible echo path is simulated by a delayed signal, with filtering in the delay feedback path to simulate the more rapid absorbtion of higher frequencies in the air and the environment. For the direct path, and for each echo path, directions are individually assigned; first order simulated reflections are emphasised to mask those due to the real listening environment. Directions are assigned to signals using the method of Kogure et al, cited above; the Kogure patent is incorporated into the Kendall patent by reference for this purpose (Col. 6 lines 45-48). The Kendall reverberator may provide the most accurate known simulation for indoor environments. Presumably it will not model sounds imaged to an outdoor environment, since such an environment generally lacks reverberation. The mathematical derivation of the numerous parameters in Kendall's invention relies on intimate knowledge of the room shape, its dimensions, the listener position, and the direction in which the listener is facing.
Kendall's patent mentions the use of "pinna cues" for direction, though the schematics shown incorporate no apparent means for their insertion. The pinna is the external flap of the human ear, and it modifies incoming sound according to its direction of arrival. In an article published in the Journal of the Audio Engineering Society in September 1977 (vol. 25 no. 9 pages 560-565), P. J. Bloom reports the use of simulated pinna cues to give an impression of sound source elevation in a monophonic environment. He modified broadband signals with a narrowband notch filter, and was able to produce a variable impression of elevation by varying the centre frequency of the notch. These fascinating results could not be applied to a narrowband signal, as the notch would merely cause a level change, so that the required spectral cues would not be present in the processed signal.
It is clear that the more recent refinements of the stereo system have not produced great improvement in the systems which are presently in widespread use for entertainment. This may be because their impressive towers of acoustical theory are based on an insufficiently stable foundation. Real listeners like to sit at ease, move or turn their heads, and place their loudspeakers to suit the convenience of room layout and to fit in with other furniture. Furthermore, the stereo loudspeaker system already contains deep seated, and perhaps irremediable compromises towards convenience at the expense of accuracy. Impressive sound images are available if two microphones, placed in a dummy head, feed strictly separate signals to a pair of headphones, so that signals are never mixed between channels. Once the acoustic signals are mixed by loudspeaker reproduction, their practical re-separation may be a problem comparable with unscrambling eggs.
With the increasing sterility of approaches based on acoustic theory, and no solution in sight to the analysis of human perception, a return to the earlier empiricism seems indicated. It is noteworthy that the basis of all the approaches detailed above, and indeed many others, is the basis in the Blumlein patent cited. In making a fresh empirical departure, we remain today in the position so ably documented by Blumlein: "The operation of the ears in determining the direction of a sound source is not yet fully known but it is fairly well established that the main factors having effect are phase differences and intensity differences between the sounds reaching the two ears, the influence which each of these has depending upon the frequency of the sounds emitted." (Col. 2 lines 25-32).