Human hearing is spatial and three-dimensional in nature. That is, a listener with normal hearing knows the spatial location of objects which produce sound in his environment. For example, in FIG. 1 the individual shown could hear the sound at S1 upward and slightly to the rear. He senses not only that something has emitted a sound, but also where it is even if he can't see it. Natural spatial hearing is also called binaural hearing; it allows us to near the musicians in an orchestra in their separate locations, to separate the different voices around us at a cocktail party, and to locate an airplane flying overhead.
Scientific literature relating to binaural hearing shows that the principal acoustic features which make spatial hearing possible are the position and separation of the ears on the head and also the complex shape of the pinnae, the external ears. When a sound arrives, the listener senses the direction and distance of its source by the changes these external features have made in the sound when it arrives as separate left arid right signals at the respective eardrums. Sounds which have been changed in this manner can be said to have binaural location cues: when they are heard, the sounds seem to come from the correct three-dimensional spatial location. As any listener can readily test, our natural binaural hearing allows hearing many sounds at different locations all around and at the same time.
Binaural sound and commercial stereophonic sound are both conveyed with two signals, one for each ear. The difference is that commercial stereophonic sound usually is recorded without spatial location cues; that is, the usual microphone recording process does not preserve the binaural cuing required for the sound to be perceived as three-dimensional. Accordingly, normal stereo sounds on headphones seem to be inside the listener's head, without any fixed location, whereas binaural sounds seem to come from correct locations outside the head, just as if the sounds were natural.
There are numerous applications for binaural sound, particularly since it can be played back on normal stereo equipment. Consider music where instruments are all around the listener, moved or "flown" by the performer; video games where friends or foes can be heard coming from behind; interactive television where things can be heard approaching offscreen before they appear; loudspeaker music playback where the instruments can be heard above or below the speakers and outside them.
One well-known early development in this field consisted of a dummy head ("kunstkopf") with two recording microphones in realistic ears: binaural sounds recorded with such a device can be compellingly spatial and realistic. A disadvantage of this method is that the sounds' original spatial locations can be captured, but not edited or modified. Accordingly, this earlier mechanical means of binaural processing would not be useful, for example, in a videogame where the sound needs to be interactively repositioned during game play or in a cockpit environment where the direction of an approaching missile and its sound could not be known in advance.
Recent developments in binaural processing use a digital signal processor (DSP) to mathematically emulate the dummy head process in real time but with positionable sound location. Typically, the combined effect of the head, ear, and pinnae are represented by a left-right pair of head-related transfer functions (HRTFs) corresponding to spherical directions around the listener, usually described angularly as degrees of azimuth and elevation relative to the listener's head as indicated in FIG. 1. The said HRTFs may arise from laboratory measurements or may be derived by means known to those skilled in the art. By then applying a mathematical process known as convolution wherein the digitized original sound is convolved in real time with the left- and right-ear HRTFs corresponding to the desired spatial location, right- and left-ear binaural signals are produced which, when heard, seem to come from the desired location. To reposition the sound, the HRTFs are changed to those for the desired new location. FIG. 2 is a block diagram illustrative of a typical binaural processor.
DSP-based binaural systems are known to be effective but are costly because the required real time convolution processing typically consumes about ten million instructions per second (MIPS) signal processing power for each sound. This means, for example, that using real time convolution to create the binaural sounds for a video game with eight objects, not an uncommon number, would require over eighty MIPS of signal processing. Binaurally presenting a musical composition with thirty-two sampled instruments controlled by the Musical Instrument Digital Interface (MIDI) would require over three hundred MIPS, a substantial computing burden.
The present invention was developed as an economical means to bring these applications and many others into the realm of practicality. Rather than needing a DSP and real time binaural convolution processing, the present invention provides means to achieve real time, responsive binaural sound positioning with inexpensive small computer central processing units (CPUs), typical "sampler" circuits widely used in the music and computer sound industries, or analog audio hardware.