This invention relates generally to the field of audio-signal processing and more particularly to a system for stereo audio-signal processing and stereo sound reproduction incorporating head-diffraction compensation, which provides improved sound-source imaging and accurate perception of desired source-environment acoustics while maintaining relative insensitivity to listener position and movement.
There is a wide variety of prior-art stereo systems, most of which fall within three general categories or types of systems. The first type of stereo system utilizes two omnidirectional microphones usually spaced approximately one half to two meters apart and two loudspeakers placed in front of the listener towards his left and right sides in correspondence one for one with the microphones. The signal from each microphone is amplified and transmitted, often via a recording, through another amplifier to excite its corresponding loudspeaker. The one-for-one correspondence is such that sound sources toward the left side of the pair of microphones are heard predominantly in the left loudspeaker and right sounds in the right. For a multiplicity of sources spread before the microphones, the listener has the impression of a multiplicity of sounds spread before him in the space between the two speakers, although the placement of each source is only approximately conveyed, the images tending to be vague and to cluster around loudspeaker locations.
The second general type of stereo system utilizes two unidirectional microphones spaced as closely as possible, and turned at some angle towards the left for the leftward one and towards the right for the rightward one. The reproduction of the signals is accomplished using a left and right loudspeaker placed in front of the listener with a one-for-one correspondence with the microphones. There is very little difference in timing for the emission of sounds from the loudspeakers compared to the first type of stereo system, but a much more significant difference in loudness because of the directional properties of the angled microphones. Moreover, such difference in loudness translates to a difference in time of arrival, at least for long wavelengths, at the ears of the listener. This is the primary cue at low frequencies upon which human hearing relies for sensing the direction of source. At higher frequencies (i.e., above 600 Hz), directional hearing relies more upon loudness differences at the ears, so that high frequency sounds in such stereo systems have thus given the impression of tending to be more localized close to the loudspeaker positions rather than spread as the original sources had been.
The third general type of stereo system synthesizes an array of stereo sources, by means of electrical dividing networks, whereby each source is represented by a single electrical signal that is additively mixed in predetermined proportions into each of the two stereo loudspeaker channels. The proportion is determined by the angular position to be allocated for each source. The loudspeaker signals have essentially the same characteristic as those of the second type of stereo system.
Based upon these three general types of stereo systems, there are many variants. For example, the first type of system may use more than two microphones and some of these may be unidirectional or even bidirectional, and a mixing means as used in the third type of system may be used to allocate them in various proportions between the loudspeaker channels. Similarly, a system may be primarily of the second type of stereo system and may use a few further microphones placed closed to certain sources for purposes of emphasis with signals to be proportioned between the channels. Another variant of the second type of stereo system makes use of a moderate spacing, for example 150 mm, between the microphones with the left angled microphone spaced to the left, and the right-angle microphone spaced to the right. Another variant uses one omnidirectional microphone coincident, as nearly as possible, with a bidirectional microphone. This is the basic form of the MS (middle-side) microphone technique, in which the sum and difference of the two signals are substantially the same as the individual signals from the usual dual-angled microphones of the second type of system.
Variants are also known that focus on loudspeaker arrangements. A well-known example has a third loudspeaker centered between the stereo pair, to be driven by a signal proportional to the so-called mono sum, the sum of the stereo signals, a style of connection also known as bridging. Use of this loudspeaker is supposed to remedy a lack of stereo imaging in the center, a so-called hole in the middle, and also to stabilize the imaging against varying listener position. The center loudspeaker is common in cinema-sound arrangements in which it is centered behind the acoustically transparent screen. Such centered loudspeakers are discussed in W. B. Snow, "Basic Principles of Stereophonic Sound," J. Soc. Mot. Pict. and Telev. Eng., Vol. 61 (November 1953). Cinema sound now often uses special circuits called "logic" to steer the mono sum wholly into this center channel for dialog, which would otherwise be so imprecisely localized as to be distracting. Surround-sound arrangements are not pursued here in favor of frontal arrangements that may, however, include four loudspeakers.
Each of these systems has its advantages and disadvantages and tends to be favored and disfavored according to the desires of the user and according to the circumstances of use. Each fails to provide localization cues at frequencies above approximately 600 Hz. Many of the variants represent efforts to counter the disadvantages of a particular system, e.g., to improve the impression of uniform spread, to more clearly emulate the sound imaging, to improve the impression of "space" and "air," etc. Nevertheless, none of these systems adequately reckons with the effects upon a soundwave of propagation in the space close to the head in order to reach the ear canal. This head diffraction substantially alters both the magnitude and phase of the soundwave, and causes each of these characteristics to be altered in a frequency-dependent manner.
The use of head-diffraction compensation to make greatly improved stereo sound in a loudspeaker system was demonstrated by M. R. Schroeder and B. S. Atal to emulate the sounds of various concert halls with extraordinary accuracy. Schroeder measured the values of head-related transfer functions for an artificial or "dummy" head (i.e., a physical replica of a head mounted on a fully-clothed manikin) that had microphones placed in its ear canals. This information was used to process two-channel sound recorded using a second artificial head (i.e., to process a binaural recording). Since each ear hears both speakers, the system used crosstalk cancellation to cancel the effects of sound traveling around the listener's head to the opposite ear. Crosstalk cancellation was performed over the entire audio spectrum (i.e., 20 Hz to 20 KHz)
For a listener whose head reasonably well matched the characteristics of the manikin head, the result was a great improvement in characteristics such as spread, sound-image localization and space impression. However, the listener had to be positioned in an exact "sweet spot" and if the listener turned his head more than approximately ten degrees, or moved more than approximately 6 inches the illusion was destroyed. Thus, the system was far too sensitive to listener position and movement to be utilized as a practical stereo system.
Head simulation and head compensation used together also permit loudspeaker reformatting. A loudspeaker reformatter converts input signals intended for a specific loudspeaker bearing angle (e.g., .+-.30.degree.). into a format for presentation at another loudspeaker bearing angle (e.g., .+-.15.degree.). One application of a reformatter exists in television stero wherein it is very difficult to mount loudspeakers in the television cabinet so that they would be placed at bearing angles as large as .+-.0.degree. for a viewer. Another application may be found in a listening room that is too narrow for .+-.30.degree. placement because of a need to place a substantial distance between each loudspeaker and its corresponding sidewall, together with a desire to be seated not too close to the loudspeakers. In this way, it is possible to be forced to accept a small angle, perhaps .+-.15.degree., for loudspeaker placement, yet retain the imaging more nearly characteristic of .+-.30.degree. by using a reformatter. A narrow angular range for loudspeaker placement (narrow speaker base) also permits a wide range in listener position.
As improved television standards, including those for higher picture definition, wider-aspect pictures, and enhanced sound quality, are developed, the need for enhanced sound-image stability increase. Narrow-base speaker arrays with image-spread reformatting are an attractive application of this technology, almost regardless of the stereo technology to be employed.
It is accordingly an object of the invention to provide a novel stereo system which provides enhanced sound-imaging localization which is relatively independent of listener position and movement.
It is another object of the invention to provide a novel stereo system for adapting sound signals utilizing head-diffraction functions, and crosscoupling with filtering to substantially limit the frequency range of such processing to substantially below approximately ten kilohertz to provide enhanced source imaging and accurate perception of simulated acoustics in such frequency range.
It is a further object of the invention to provide means of utilizing head-diffraction functions so that they may be simulated by means of simple electrical analog or digital filters, in most cases of the minimum-phase type.
Briefly, according to one embodiment of the invention, an audio processing system for reformatting is provided including means for providing two channels of binaural signals. In addition, means are provided for cross-talk cancellation, and means for naturalization compensation to correct for the cross-talk cancellation and for propagation path distortions to produce a sum and a difference filtered signal and including filtering means for substantially limiting the cross-talk cancellation and naturalization compensation to frequencies. Summing and differencing means are provided for generating a sum output, a difference output and at least one other output from the sum and difference filtered signals.