This invention relates generally to the fields of audio signal reproduction and audio signal processing, and more particularly to a system for increasing the area over which a satisfactory audio illusion is created and maintained, relative to prior art audio reproduction systems. The method may employ a multi-way loudspeaker pair with drivers operating over diverse frequency ranges arrayed generally in a horizontal dimension (for a normally-oriented head of a listener) and with higher-frequency drivers generally closer together and displaced more towards the center of the listening space than lower-frequency drivers, and specially-adapted signal processing components for audio imaging to create or maintain desirable audio imaging.
The history of stereophonic sound (xe2x80x9cstereophonyxe2x80x9d or, more commonly and colloquially, xe2x80x9cstereoxe2x80x9d) includes a number of methods of recording sounds and another number of methods for playing those recorded signals back to a listener or listeners. While it has always been an accepted idea that a listener should be xe2x80x9ctransportedxe2x80x9d to another acoustical space, such as the acoustic space occupied by an audience member at a live concert or a more synthetic, more conceptual, space for many modern popular recordings in which there was no actual performance in front of a live audience, the methods used for this xe2x80x9ctransportingxe2x80x9d have largely failed in that goal. A reason for the failure has been that no systematic, rigorous method was usually applied in designing the various systems, designers and recording personnel frequently instead relying primarily upon largely unscientific principles and serendipity to achieve their goals. That billions of commercial recordings have been sold and broadcast is more a statement of the appeal of the content of the recordings than the ability to transport the listener into another space. However, workers such as Schroeder and Atal, and Cooper and Bauck, have devised playback systems which employ signal processing methods which are firmly footed in engineering science and based on the concept that particular signals will be placed in and around the ears of one or more listeners so that it becomes the task of the producer of the program material to provide a version of the xe2x80x9cdesiredxe2x80x9d signals.
These latter-day methods, though currently far in the minority of systems and recordings purchased by consumers to date, can be extraordinarily effective in transporting the listener to another, believable, acoustic space, when properly designed. Perhaps an indication of the failure of traditional systems to perform as hoped, and of the success of the latter-day systems, is that the newer systems are often called xe2x80x9c3D audio,xe2x80x9d xe2x80x9c3D sound,xe2x80x9d and the like (xe2x80x9c3Dxe2x80x9d meaning three-dimensional), and the vernacular use of xe2x80x9cstereophonicxe2x80x9d often refers to the earlier systems. A simple translation of xe2x80x9cstereophonicxe2x80x9d from its Greek-root components means xe2x80x9cof or relating to three-dimensional sound.xe2x80x9d Thus, with the advent of practical implementations of the latter-day systems, the audio community found it necessary to coin a new phrase, thus xe2x80x9c3D audioxe2x80x9d and the like.
In keeping with current usage, we will use the current term, 3D audio, to refer to the latter-day systems. These systems typically employ some kind of circuitry or algorithm which compensates for the fact that sound emanating from each of two loudspeakers impinges on both ears of a listener, so that, for example, sound radiating from a left-placed loudspeaker of a pair of loudspeakers travels to the left ear of a listener, but also travels to the right ear of a listener, this latter sound being called crosstalk. The transmission from each loudspeaker to each ear can be anticipated by designing the circuitry or algorithm, from knowledge of so-called head-related transfer functions (HRTFs), so that when the circuit or algorithm, taken together with at least two loudspeakers, all as a unit, can separately and distinctly control the sounds at the ears of one or more listeners. It is also possible to correct for frequency response aberrations caused by the diffraction of the listener""s head so that a natural timbre is perceived by the listener.
It is known in the art, especially in the patents of Cooper and Bauck, that improved performance can be achieved by deliberately modifying the filters comprising the crosstalk cancelling circuitry or algorithms or related circuitry or algorithms from their strict specifications from HRTFs. For example, it may be necessary in some cases to modify the filters in such a way as to make them stable or otherwise realizable. Other modifications are known in the art, such as using HRTFs measured from a model mannequin head rather than the listener""s own head, the use of minimum phase transfer functions, the use of simplified head models such as smoothed HRTFS, spheres, or two points in free space (for ears), and the use of delays to convert noncausal filters into causal filters. Some deviations from the full HRTF specifications may be quite extreme, for instance, following the HRTF specification up to only some 600 Hz and allowing factors other than the most precise imaging to specify the response above 600 Hz. Any such modification, while deviating from the strict specification of the listener""s own HRTFs, may be considered to be advantageous, either for the sake of performance or economies or both. Also, such modifications may result in less than perfect cancellation of crosstalk and/or less than perfect correction of timbre. Nonetheless, we will refer to all such devices as crosstalk cancellers. Crosstalk cancellers are the heart of most 3D audio systems, allowing predetermined control of signals at the ears of the listener or listeners, thus removing many elements of luck from the playback experience. It is therefore an object of the invention that any crosstalk canceller with any of the several described modifications or other modifications may be used either explicitly or implicitly as the imaging component of the invention.
One application of crosstalk cancellation is in playing back recordings made with an acoustical mannequin, a dummy head with microphones placed in its ear canals or thereabouts. Such a recording-playback system results in the most realistic impression of being transported to another space.
Another application of crosstalk cancellers is as part of an imaging circuit or algorithm, a so-called speaker-spreader or layout reformatter such as described by Schroeder and Atal, and Cooper and Bauck. In this application, the listener can receive the impression that, for example, a pair of loudspeakers which is placed on the sides of a television receiver cabinet, much too close for perceiving any readily noticeable amount of stage width, appear to be farther apart, with well-defined sounds apparently emanating from points in space where there are no actual loudspeakers, a xe2x80x9cvirtual loudspeakerxe2x80x9d impression. In this application, it is most common for the input signals to be any kind of ordinary stereo; the input signals may also be provided by a home theater or multichannel television audio decoder, providing five or more channels of audio signals.
Still another application of the principle of crosstalk cancellation is in the creation of interactively-controlled sound sources (and their reflections in an acoustic environment, if desired) such as would exist in computer-based or game-console-based games, when the sounds for those games are presented to the player or players over loudspeakers.
So it is seen that a crosstalk canceller is a basic component of controlling signals at the ears of a listener, usable with either binaurally recorded programs or with any kind of traditional stereo programs, for the general enhancement thereof.
Playback systems which do not effectively use a crosstalk canceller are also sometimes known as 3D. Such systems can create the impression that sound is arriving from points in space where there are no actual loudspeakers, but rather than provide the impression that there are virtual loudspeakers or other spatially discrete or distinct sources, the impression is that of a wall of sound with little or no impression of spatially discrete sources. To the extent that these systems benefit from placing loudspeakers close together (as described below), they may also benefit from the invention. And of course, enhancement of these xe2x80x9cnondiscretexe2x80x9d systems is possible by the use of the virtual loudspeaker concept.
It is an aspect of the invention that it may be used or combined with any type of 3D imaging circuit or algorithm, whether xe2x80x9cdiscrete 3Dxe2x80x9d or xe2x80x9cwall-of-sound 3D.xe2x80x9d
With the general framework established, we may now begin to discuss a specific problem that exists in essentially all prior-art audio systems, whether of the traditional or 3D variety. Essentially all such systems have a listening area in which the sound impression is best. Listeners in that area receive an impression that is better than at any other place in the playback room or listening space. Typically, there are two loudspeakers and the favored area is on a line bisecting a line segment drawn between the two loudspeakers, and more particularly at a specified distance or other geometrical relationship to the loudspeakers. Wherever the favored region is, it is commonly called the xe2x80x9csweet spot,xe2x80x9d and we will use that terminology here, even through xe2x80x9cspotxe2x80x9d may tend to imply xe2x80x9cpointxe2x80x9d rather than xe2x80x9cregion.xe2x80x9d The sweet spot is restricted in its extent, frequently being so small that only one person can enjoy the best spatial impression at one time, whether for traditional or 3D stereo; the sweet spot size is sometimes so small that even a singe listener may feel constrained as to where he or she should hold his or her head to fully enjoy the sweet spot. Usually the sweet spot is an elongated region, really rather oblate ellipsoidal in shape, allowing listeners to move in an out along the bisecting line, or up and down while remaining mostly in the bisecting plane, but being very unforgiving with respect to listener movement to the left and right, over wide variations in a standard two-loudspeaker setup. This is the most unfortunate direction in which to have a small extent of the sweet spot, since it is most commonly desired that multiple listeners be seated abreast of one another and not lined up nose-to-nape.
With the advent of practical 3D audio systems and the associated ability to precisely control the sounds at the listener""s ears, it is common for listeners to perceive that the sweet spot is smaller than they are accustomed to with prior experience listening to ordinary stereo systems. It has been conjectured by Bauck and Cooper (such conjecture borne out informally by the experience of many listeners to such 3D systems), that the sweet spot is not actually smaller, but, since it is much sweeter, listeners tend to feel more deprived upon moving out of the sweet spot. Also, the rate of deprivation with respect to movement away from the optimum position would appear to be greater, perhaps lending even more feeling that the sweet spot is rather small.
Regardless of the nature of the playback system (traditional or 3D), it is always desirable to make the sweet spot larger. It is, therefore, an object of this invention to do so.
One reason that there is a sweet spot is that with reproduction with two or more loudspeakers, the signals at the listener""s ears are formed by the interference (summation) of acoustic waves emanating from the loudspeakers. With two loudspeakers, the field can be controlled precisely (assuming the absence of resonant structures) at only two points. Presumably, those points are to be at the listener""s ears. Whether the ear signals are a result of a so-called 3D system or any other technique, if the listener moves his or her head so that the ears are no longer at the designated positions, image distortion will appear, caused by unintended ear signals created by unanticipated interference. The primary causes of the changing interference are differing times-of-arrival due to differing loudspeaker-to-listener distances, followed in importance by amplitude variations of the impinging waves due to the same varying distances (aggravated by the listener sitting close to the loudspeakers), and reflections from any uncompensated reflections (improved by the listener sitting close to the loudspeakers).
An important aspect of this diffraction problem is that for a given amount of movement of the listener""s head from the designed-for position it is wavelength dependent. Ear signals at higher frequencies are affected relatively more than those at lower frequencies because the given amount of movement is a larger fraction of a wavelength (or larger number of wavelengths) at the higher frequencies.
In prior art systems, the effects of a listener moving out of the sweet spot are well-known, even by casual listeners. For example, a vocalist who initially appears as a centered phantom image midway between two loudspeakers when the listener is on the bisecting line then appears to subsequently shift towards the nearer loudspeaker when the listener moves away from the bisecting line. The effect is so pronounced that the sound image collapse into the nearer loudspeaker is nearly complete when the listener""s head is only a few inches closer to the one loudspeaker than the other. This is the well-known precedence effect, sometimes called the Haas effect after one of its early researchers. It is usually thought to be a psychoacoustic effect, perhaps with its origins in the processing of the inner ear or brain. If that is the case, it may be an evolutionary adaptation to allow accurate localization of sounds in reflective environments. However, it is possible that the effect is also rooted in physical acoustics, a hypothesis that has not been fully investigated. In any event, the amount of image shift as a function of time-of-arrival differences from two sources has been studied thoroughly, with the result that the farther the listener is from the bisecting line, the farther the perceived shift of the center phantom image. It should be noted that the perceived image distortion due to this effect is not, strictly speaking, a shift, but is accompanied by an increase in the spatial extent of the image, or, more oddly, a kind of ambiguity or uncertainty as to the actual location of the image.
One prior art method attempts to reduce the shifting of phantom images by the use of specially designed loudspeakers. Researchers investigating the precedence effect found that the shift of a previously centered phantom image could be partially compensated by increasing the level of the later-arriving sound, that is, by increasing the signal gain of the more distant loudspeaker of the pair. In fact, experimentally derived plots have been published which show how much the gain has to be increased, as a function of time-of-arrival differences, to bring the image back to the center, or approximately so. Such compensations, though not precise and not resulting in a well-formed re-centered phantom, have been found useful enough by a few loudspeaker manufacturers that they have made loudspeakers which had radiation patterns so that as a listener moves from the bisecting line, he or she moves more directly into the main lobe of the more distant loudspeaker. A version of this plan has the listener orienting his or her conventionally-designed loudspeakers so that their main radiation lobes cross in front of the specified listening position (over tow-in). Some found either technique to be helpful, but the compensation is only approximate, and less effective at low frequencies due to the relative impossibility of creating directional radiation patterns at those frequencies. Nonetheless, it is an object of this invention that this type of radiation control may be combined with the novel techniques described herein to accommodate more types of solutions to the sweet spot problem.
Another prior art technique, introduced by Cooper and Bauck, used a method (which is in dependent of the present invention) of alleviating the perceived sweet spot problem in 3D systems by modifying the responses of the acoustically-specified imaging filters at the higher frequencies, effectively allowing gradual transition to xe2x80x9cdefaultxe2x80x9d imaging of the affected frequencies at the loudspeakers. Listeners seem to prefer having the higher frequencies remain mostly stationary with head movements than to have them flitting around or be otherwise poorly imaged. Indeed, the sweet spot can in fact be enlarged by modifying the filters down to lower frequencies, but at the expense of more and more of the higher frequencies falling into the loudspeakers, a trade-off in sweet spot size for xe2x80x9csweetness.xe2x80x9d It is an object of the invention that it may be combined with such prior art methods.
A crucial observation is that the time-of-arrival differences from two loudspeakers to either ear of a listener, as he or she moves about on either side of the bisecting line, is diminished if the loudspeakers are close together. A simple plot of time-of-arrival differences is shown in FIG. 1, for a single point in space. The hyperbolic curves represent contours of equal time-of-arrival differences, in milliseconds. The horizontal and vertical axes are positions of the point in space, in meters. The small, heavy circles represent the locations of the two loudspeakers, modeled as point sources. A is calculated for loudspeakers at a distance of 1.5 meters, while B is calculated for a loudspeaker distance of 0.5 meters. (For convenience, the loudspeaker spacing and the line between loudspeakers will be referred to as the baseline distance, or simply the baseline.) It is apparent from these contour plots that the short-baseline array results in smaller time-of-arrival anomalies for the same amount of displacement from the center line. While this simple model and analysis does not include the effects of the listener""s HRTFs or indeed the fact that a normally-endowed listener has two ears, it nevertheless illustrates the basic principle. An analysis using a more realistic model will be explored in detail shortly.
That fact that short-baseline arrays hold some advantages was noticed some years ago by Cooper and Bauck. Other researchers have more recently studied the advantages of this approach. The ultimate short-baseline array is the monopole-dipole (xe2x80x9cmiddle-sidexe2x80x9d) array of Lauridsen, and its improvements as taught by Cooper and Bauck. Of course, with the technology of virtual loudspeakers, one may consider deliberately creating a short-baseline array, then expanding the apparent stage width with the appropriate signal processing, for an expanded sweet spot, but at a cost of trading stability of outlying images for improved stability of near-center images, depending upon the details of a particular design. In other cases, a short-baseline array may be dictated by other needs, such as the need to attach loudspeakers on the sides of a television or computer video monitor, or the practical difficulty of locating the several loudspeakers common in current home theaters in their optimum locations.
It is nearly universal practice in loudspeaker design to configure the tweeters and woofers of a two-way loudspeaker, or more generally the various transducing drive units (acoustical emitters) covering different frequency bands in a multiway loudspeaker, in a primarily vertical direction. While there are exceptions, in which for example a midrange driver may be located beside a tweeter, perhaps with one or both of them comprising a xe2x80x9cline sourcexe2x80x9d or ribbon-style driver, such side-by-side placement is usually accepted as a compromise in the pursuit of other design goals, and it is usually desired that those drivers should be as close together as possible, horizontally, to maintain signal integrity at the listeners"" ears.
There have been attempts to create loudspeaker arrays using horizontally-oriented multi-passband drive units. Electromagnetic versions of such arrays are also used from time to time in communications and radar antennas. In either application, the intent is to control, at least partly, the radiation pattern at various frequencies, usually with the intent that it maintain a constant shape, or beamwidth, at all frequencies of the intended range of operation. Such a goal can be attained, at least partially, by creating an array which is effectively the same length at all frequencies, as measured in number of wavelengths at each frequency. The normal procedure for doing this is to progressively low pass filter the feed signals to the elements of the array more severely for elements lying more towards the ends of the array. This technique remains largely an obscure curiosity in the field of audio reproduction due to the enormous range of frequencies normally encountered (some ten octaves for high fidelity reproduction) and the fact that to attain significant control of the radiation pattern over important portions of the audible spectrum would require arrays of such a large size as to render them impractical. Some proposals have been more modest, suggesting that beamwidth control over as little as an octave can be effective for applications such as sound reinforcement, but this is not an application in which more than one loudspeaker is used to form effective audio phantom images (the field addressed by the invention) nor does the invention teach the formation of constant radiation patterns over frequency variation, although there may well be a tendency towards such behavior as a side effect.
Another prior art loudspeaker employing a horizontal array is that of Polk. However, the drive units of this device are arrayed in this manner for other purposes, do not employ imaging circuitry, do not enlarge the sweet spot, and in other ways do not anticipate the invention.
While the use of a short baseline alleviates the sweet spot problem, another problem arises; the degree of the problem depends on the location and frequency content of a virtual image in a 3D system. Consider that if a natural image containing large amounts of low frequencies relative to other frequencies appears towards a listener""s left-hand side (90xc2x0 counterclockwise from above from the nose which is considered to be at 0xc2x0), the dominant air particle motion in the vicinity of the head is to and fro, parallel to a line through the ears. In order for two front-placed loudspeakers to recreate such a low-frequency motion, they must operate with substantially opposite polarity on similar signals. This constitutes, at the lower frequencies, an approximation to the well-known and much-studied acoustic dipole. The problem with this arrangement is that the two loudspeakers tend to cancel one another""s low frequency sound (i.e., relatively little low frequency energy is radiated towards the listener). Consequently, potentially large signals must be applied to the loudspeakers, requiring large amplification factors and large excursions of the loudspeaker radiating surfaces, a practical problem in applications of such 3D systems as virtual home theaters and video games reproducing side-placed low frequency sound effects, as well as in 3D processing and playback of many recordings of ordinary music. Another disadvantage is that large amounts of acoustic energy are reflected around the room before finally arriving at the listener, introducing still more unaccounted-for factors into this playback method
A closely related scenario is that of a binaural recording being played over a crosstalk canceller. In this application, the low-frequency problem at first appears to be even worse, since the filter specification is for even more bass signal for a virtual source towards the listener""s left, as taught most clearly by Cooper and Bauck in their explanation of sum-and-difference style of signal processing. Depending on the loudspeaker angle (as seen by the listener), the bass response of the left-minus-right (Lxe2x88x92R) component, that which is predominant in the placement of the left-oriented image, at first inspection seems to be such as to make the whole enterprise nearly impractical, showing a first-order increasing slope (20 dB per decade of frequency) with decreasing frequency, and with the onset of the slope occurring at a higher frequency with more closely-spaced loudspeakers. However, it is important to realize that a naturally-occurring image at 90xc2x0 necessarily contains relatively little Lxe2x88x92R information in the low frequencies, since the ear signals are nearly identical in both amplitude and phase. Therefore, although a large Lxe2x88x92R gain might be specified, the Lxe2x88x92R signal is small, so the filtered signal might still be of a reasonable size, assuming that the loudspeakers are not too close together. The only practical problem is maintaining a good signal to noise ratio, but this is generally not a problem with either analog or digital implementations. The net result is that the extent of the problem is essentially the same as creating a virtual source as described in the preceding paragraph.
More severe scenarios are easily imagined. It is quite easy to conceive or create a stereo signal which does not correspond to any natural sound image and which will wreak havoc when played through, for example a loudspeaker-spreader or other layout reformatter or crosstalk canceller, all examples of 3D audio systems. For example, a bass guitar in one originating channel of a conventional stereo formatted signal, with silence in the other channel, when played over a crosstalk canceller, is highly unnatural; the playback system attempts to place the sound of a bass guitar in one ear of the listener and silence in the other ear, an extremely demanding task at any reasonable playback volume.
In any of the examples described above, the demands on low-frequency signal excursions in both the amplifiers and loudspeakers increase, that is, get worse, the closer together the loudspeakers are placed. Thus, the desirable effects of a short-baseline array are offset by the greatly increased signal handling capacity required to realized the necessary signals, both electronic and acoustic.
This circumstance, that of increased low-frequency signal capacity requirements with shorter-baseline arrays, is extremely unfortunate, as it compounds with another aspect of low-frequency reproduction of audio signals. As is well known by audio engineers, in an ordinary stereo set, (i.e., one not required to place low-frequency images outside the spatial extent of the usual two-loudspeaker layout or, for that matter, any single-loudspeaker audio reproduction system), the adequate reproduction of the lower frequencies a priori requires that much larger volumes of air be moved. This is normally accomplished by using drivers of larger diameter and with a much larger linear excursion capability than needed for reproduction of higher frequencies. Similarly, most of the linear signal excursion range of the associated amplifiers is used up by the lower frequencies, with the smaller higher frequency components appearing to ride atop the more slowly undulating low frequency components.
The problems of creating adequate signal levels for low-frequency program material placed as virtual images generally outside the extent of a two-loudspeaker array are not merely hypothetical examples. The inventor has observed precisely the behavior that he describes on numerous occasions when demonstrating various types of 3D audio programs. In today""s commercial environment, with consumers demanding more realism from their audio systems, with the popularity of home theaters and games and the associated preponderance of high-level, low-frequency, side-placed sound images of special effects (frequently played over actual loudspeakers in a full home theater setup), and the emergence of computers with attached audio systems intended for playing games and simulating home theater systems as xe2x80x9cvirtual theaters,xe2x80x9d the scenarios described herein are very real indeed. It would surprise many just how quickly even a rugged, well-designed loudspeaker system can reach its limits under such circumstances, not to mention the inexpensively-made loudspeakers often associated with computers.
Another problem compounds with short-baseline arrays. To overcome the large low-frequency excursion requirements, one might decide to use larger woofers. However, if the woofers are round, it becomes rather nonsensical to find a way to place them close together. One may resort to oval or rectangular radiating surfaces, but these tend to have still other problems. Also, placing large loudspeakers close together may unacceptably compromise the practical and aesthetic design of a product such as a television or computer video monitor.
Briefly, according to one embodiment of the invention, an audio reproduction system is provided including means for providing any number of audio inputs, means for providing audio imaging using a crosstalk canceller, and a pair of two-way loudspeaker systems, each arrayed with woofer and tweeter substantially horizontally for normally-oriented heads of one or more listeners, such loudspeaker systems comprising frequency-selective crossover circuits to separate and route signals into a left woofer and tweeter pair and a right woofer and tweeter pair, the woofer and tweeter of each pair arranged so that the left and right tweeters are closer together than the left and right woofers, so that time-of-arrival differences from the tweeters vary less with off-center listeners than do time-of-arrival differences from the woofers, for similarly off-center listeners.