This invention relates generally to the field of audio-signal processing and more particularly to a system and method for stereo audio-signal processing and stereo sound reproduction incorporating head-diffraction compensation, which provides improved sound-source imaging and accurate perception of desired source-environment acoustics and equalization to ensure a natural sound quality under variety of listener-environment conditions while maintaining relative insensitivity to listener position and movement.
There is a wide variety of prior-art stereo systems, most of which fall within three general categories or types of systems. The first type of stereo system utilizes two omnidirectional microphones usually spaced approximately one half to two meters apart and two loudspeakers placed in front of the listener towards his left and right sides in correspondence one for one with the microphones. The signal from each microphone is amplified and transmitted, often via a recording, through another amplifier to excite its corresponding loudspeaker. The one-for-one correspondence is such that sound sources toward the left side of the pair of microphones are heard predominantly in the left loudspeaker and right sounds in the right. For a multiplicity of sources spread before the microphones, the listener has the impression of a multiplicity of sounds spread before him in the space between the two speakers, although the placement of each source is only approximately conveyed, the images tending to be vague and to cluster around loudspeaker locations.
The second general type of stereo system utilizes two unidirectional microphones spaced as closely as possible, and turned at some angle towards the left for the leftward one and towards the right for the rightward one. The reproduction of the signals is accomplished using a left and right loudspeaker placed in front of the listener with a one-for-one correspondence with the microphones. There is very little difference in timing for the emission of sounds from the loudspeakers compared to the first type of stereo system, but a much more significant difference in loudness because of the directional properties of the angled microphones. Moreover, such difference in loudness translates to a difference in time of arrival, at least for long wavelengths, at the ears of the listener. This is the primary cue at low frequencies upon which human hearing relies for sensing the direction of source. At higher frequencies (i.e., above 600 Hz), directional hearing relies more upon loudness differences at the ears, so that high frequency sounds in such stereo systems have thus given the impression of tending to be more localized close to the loudspeaker positions rather than spread as the original sources had been.
The third general type of stereo system synthesizes an array of stereo sources, by means of electrical dividing networks, whereby each source is represented by a single electrical signal that is additively mixed in predetermined proportions into each of the two stereo loudspeaker channels. The proportion is determined by the angular position to be allocated for each source. The loudspeaker signals have essentially the same characteristic as those of the second type of stereo system.
Based upon these three general types of stereo systems, there are many variants. For example, the first type of system may use more than two microphones and some of these may be unidirectional or even bidirectional, and a mixing means as used in the third type of system may be used to allocate them in various proportions between the loudspeaker channels. Similarly, a system may be primarily of the second type of stereo system and may use a few further microphones placed closed to certain sources for purposes of emphasis with signals to be proportioned between the channels. Another variant of the second type of stereo system makes use of a moderate spacing, for example 150 mm, between the microphones with the left angled microphone spaced to the left, and the right-angle microphone spaced to the right. Another variant uses one omnidirectional microphone coincident, as nearly as possible, with a bidirectional microphone. This is the basic form of the MS (middle-side) microphone technique, in which the sum and difference of the two signals are substantially the same as the individual signals from the usual dual-angled microphones of the second type of system.
Each of these systems has its advantages and disadvantages and tends to be favored and disfavored according to the desires of the user and according to the circumstances of use. Each fails to provide localization cues at frequencies above approximately 600 Hz. Many of the variants represent efforts to counter the disadvantages of a particular system, e.g., to improve the impression of uniform spread, to more clearly emulate the sound imaging, to improve the impression of "space" and "air," etc. Nevertheless, none of these systems adequately reckons with the effects upon a soundwave of propagation in the space close to the head in order to reach the ear canal. This head diffraction substantially alters both the magnitude and phase of the soundwave, and causes each of these characteristics to be altered in a frequency-dependent manner.
The use of head-diffraction compensation to make greatly improved stereo sound in a loudspeaker system was demonstrated by M. R. Schroeder and B. S. Atal to emulate the sounds of various concert halls with extraordinary accuracy. Schroeder measured the values of head-related transfer functions for an artificial or "dummy" head (i.e., a physical replica of a head mounted on a fully-clothed manikin) that had microphones placed in its ear canals. This information was used to process two-channel sound recorded using a second artificial head (i.e., to process a binaural recording). Since each ear hears both speakers, the system used crosstalk cancellation to cancel the effects of sound traveling around the listener's head to the opposite ear. Crosstalk cancellation was performed over the entire audio spectrum (i.e., 20 Hz to 20 KHz)
For a listener whose head reasonably well matched the characteristics of the manikin head, the result was a great improvement in characteristics such as spread, sound-image localization and space impression. However, the listener had to be positioned in an exact "sweet spot" and if the listener turned his head more than approximately ten degrees, or moved more than approximately 6 inches the illusion was destroyed. Thus, the system was far too sensitive to listener position and movement to be utilized as a practical stereo system.
In addition, in the prior art, several equalization doctrines may be found. In one of these, a coupler for fitting microphones into an artificial head provides an acoustic equalization corresponding to a flat ear-drum pressure response. Another doctrine specifies a flat response with respect to a diffuse sound field. These two approaches are indicated in a paper by M. Killion, "Equalization Filter for Eardrum Pressure Recording Using KEMAR Manikin," J. Audio Engr. Soc., vol. 27, pp. 13-16 (1979 Jan./Feb.). Yet another doctrine demands a flat pressure response at the ear-canal entrance, as used in certain known artificial heads (e.g., in the Neumann KU-80). On the other hand, Schone, et al., U.S. Pat. No. 4,338,494, teaches that the microphone response should be equalized flat with reference to a free-field, plane wave, incident at 0.degree..
The role of the equalization is to remove those frequency characteristics of the artificial head that would be essentially repeated, but should not be, in the listener's head. These are the resonances of the cavities in the external ear, the pinna, and, if included in the artificial head, the ear canal. The prior art is not correct, however, for incidence angles greater than 0.degree.. For example, it might be desirable, under some circumstances, to place the loudspeakers so that they provide incidence angles of.+-.90.degree. at an elevation angle at 45.degree.. The frontal, 0.degree. incidence for free-field equalization in the prior art would then prove to be incorrect.
It is accordingly an object of the invention to provide a novel stereo system which provides enhanced sound-imaging localization which is relatively independent of listener position and movement utilizing a novel equalization.
It is another object of the invention to provide a novel stereo system for adapting sound signals utilizing head-diffraction functions, and crosscoupling with filtering to substantially limit the frequency range of such processing to substantially below approximately ten kilohertz to provide enhanced source imaging and accurate perception of simulated acoustics in such frequency range wherein equalization separate from the crosscoupling is provided.
It is a further object of the invention to provide means of utilizing head-diffraction functions and head-diffraction function related equalization so that they may be simulated by means of simple electrical analog or digital filters, in most cases of the minimum-phase type.
It is a further object of the invention to provide a specific combination of free field signals to be used for respective specific incidence angles and to specify these angles in relation to the angles to be used for loudspeaker placement which combination is to be equalized to make for a flat microphone-signal response specifically for that combination.
It is a further object of the invention to provide an equalization method for modifying the signals to or from a crosstalk compensation means by filtering with an equalization transfer function whose magnitude is approximately proportional to the square root of the sum of the squares of the magnitudes of the acoustic transfer functions utilized for the crosstalk filters.
Briefly, according to one embodiment of the invention, an equalization method is provided for an audio processing system that generates compensated audio signals suitable for reproduction to a listener through a loudspeaker system. The audio processing system includes source means for providing two channels of audio signals having head-related transfer functions imposed thereon, and compensation means for providing an inverse crosstalk characteristic of loudspeaker-to-ear listener transmission paths by employing a two port input, and two port output, cross-coupled filter system having transfer functions which approximately simulate acoustic transfer functions of the propagation paths from a loudspeaker to a first ear of the listener and from the loudspeaker to the second ear of the listener. The equalization method is characterized by the step of modifying signals at both ports of either the input or the output of said compensation means by transmission of each signal through a filter that is essentially the same for each of the signals. The filter simulates an equalization transfer function whose magnitude is approximately proportional to the square root of the sum of squares of the magnitudes of the acoustic transfer functions.