This invention relates generally to the field of audio-signal processing and more particularly to a system for stereo audio-signal processing and stereo sound reproduction incorporating head-diffraction compensation, which provides improved sound-source imaging and accurate perception of desired source-environment acoustics while maintaining relative insensitivity to listener position and movement.
There is a wide variety of prior-art stereo systems, most of which fall within three general categories or types of systems. The first type of stereo system utilizes two omnidirectional microphones usually spaced approximately one half to two meters apart and two loudspeakers placed in front of the listener towards his left and right sides in correspondence one for one with the microphones. The signal from each microphone is amplified and transmitted, often via a recording, through another amplifier to excite its corresponding loudspeaker. The one-for-one correspondence is such that sound sources toward the left side of the pair of microphones are heard predominantly in the left loudspeaker and right sounds in the right. For a multiplicity of sources spread before the microphones, the listener has the impression of a multiplicity of sounds spread before him in the space between the two speakers, although the placement of each source is only approximately conveyed, the images tending to be ague and to cluster around loudspeaker locations.
The second general type of stereo system utilizes two unidirectional microphones spaced as closely as possible, and turned at some angle towards the left for the leftward one and towards the right for the rightward one. The reproduction of the signals is accomplished using a left and right loudspeaker placed in front of the listener with a one-for-one correspondence with the microphones. There is very little difference in timing for the emission of sounds from the loudspeakers compared to the first type of stereo system, but a much more significant difference in loudness because of the directional properties of the angled microphones. Moreover, such difference in loudness translates to a difference in time of arrival, at least for long wavelengths, at the ears of the listener. This is the primary cue at low frequencies upon which human hearing relies for sensing the direction of source. At higher frequencies (i.e., above 600 Hz), directional hearing relies more upon loudness differences at the ears, so that high frequency sounds in such stereo systems have thus given the impression of tending to be more localized close to the loudspeaker positions rather than spread as the original sources had been.
The third general type of stereo system synthesizes an array of stereo sources, by means of electrical dividing networks, whereby each source is represented by a single electrical signal that is additively mixed in predetermined proportions into each of the two stereo loudspeaker channels. The proportion is determined by the angular position to be allocated for each source. The loudspeaker signals have essentially the same characteristic as those of the second type of stereo system.
Based upon these three general types of stereo systems, there are many variants. For example, the first type of system may use more than two microphones and some of these may be unidirectional or even bidirectional, and a mixing means as used in the third type of system may be used to allocate them in various proportions between the loudspeaker channels. Similarly, a system may be primarily of the second type of stereo system and may use a few further microphones placed closed to certain sources for purposes of emphasis with signals to be proportioned between the channels. Another variant of the second type of stereo system makes use of a moderate spacing, for example 150 mm, between the microphones with the left angled microphone spaced to the left, and the right-angle microphone spaced to the right. Another variant uses one omnidirectional microphone coincident, as nearly as possible, with a bidirectional microphone. This is the basic form of the MS (middle-side) microphone technique, in which the sum and difference of the two signals are substantially the same as the individual signals from the usual dual-angled microphones of the second type of system.
Each of these systems has its advantages and disadvantages and tends to be favored and disfavored according to the desires of the user and according to the circumstances of use. Each fails to provide localization cues at frequencies above approximately 600 Hz. Many of the variants represent efforts to counter the disadvantages of a particular system, e.g., to improve the impression of uniform spread, to more clearly emulate the sound imaging, to improve the impression of "space" and "air," etc. Nevertheless, none of these systems adequately reckons with the effects upon a soundwave of propagation in the space close to the head in order to reach the ear canal. This head diffraction substantially alters both the magnitude and phase of the soundwave, and causes each of these characteristics to be altered in a frequency-dependent manner.
The use of head-diffraction compensation to make greatly improved stereo sound in a loudspeaker system was demonstrated by M. R. Schroeder and B. S. Atal to emulate the sounds of various concert halls with extraordinary accuracy. Schroeder measured the values of head-related transfer functions for an artificial or "dummy" head (i.e., a physical replica of a head mounted on a fully-clothed manikin) that had microphones placed in its ear canals. This information was used to process two-channel sound recorded using a second artificial head (i.e., to process a binaural recording). Since each ear hears both speakers, the system used crosstalk cancellation to cancel the effects of sound traveling around the listener's head to the opposite ear. Crosstalk cancellation was performed over the entire audio spectrum (i.e., 20 Hz to 20 KHz)
For a listener whose head reasonably well matched the characteristics of the manikin head, the result was a great improvement in characteristics such as spread, sound-image localization and space impression. However, the listener had to be positioned in an exact "sweet spot" and if the listener turned his head more than approximately ten degrees, or moved more than approximately 6 inches the illusion was destroyed. Thus, the system was far too sensitive to listener position and movement to be utilized as a practical stereo system.
It is accordingly an object of the invention to provide a novel stereo system which provides enhanced sound-imaging localization which is relatively independent of listener position and movement.
It is another object of the invention to provide a novel stereo system for adapting sound signals utilizing head-diffraction functions, and crosscoupling with filtering to substantially limit the frequency range of such processing to substantially below approximately ten kilohertz to provide enhanced source imaging and accurate perception of simulated acoustics in such frequency range.
It is a further object of the invention to provide means of utilizing head-diffraction functions so that they may be simulated by means of simple electrical analog or digital filters, in most cases of the minimum-phase type.
Briefly, according to one embodiment of the invention, an audio processing system is provided including means for providing two channels of audio signals having head-related transfer functions imposed thereon. In addition, means are provided for cross-talk cancellation, and means for naturalization compensation to correct for the cross-talk cancellation and for propagation path distortions including filtering means for substantially limiting the cross-talk cancellation and naturalization compensation to frequencies substantially below ten kilohertz. In another embodiment, means are provided for simulating the two channels of audio signals from a single channel of audio signals by processing the single channel of audio signals to generate synthetic head signals for each ear, respectively utilizing head diffraction compensation for a selected set of synthetic source bearing angles. According to another aspect of the invention, a reformatted is provided for reformatting audio signals generated for reproduction at a first set of stereo speaker bearing angles to a format for reproduction at a second selected set of stereo speaker bearing angles.