This invention relates to method and apparatus for the presentation of spatialized sound over loudspeakers.
Sound localization is a term which refers to the ability of a listener to estimate direction and distance of a sound source originating from a point in three dimensional space, based the brain""s interpretation of signals received at the eardrums. Research has indicated that a number of physiological and psychological cues exist which determine our ability to localize a sound. Such cues may include, but not necessarily be limited to, interaural time delays (ITDs), interaural intensity differences (IIDs), and spectral shaping resulting from the interaction of the outer ear with an approaching sound wave.
Audio spatialization, on the other hand, is a term which refers to the synthesis and application of such localization cues to a sound source in such a manner as to make the source sound realistic. A common method of audio spatialization involves the filtering of a sound with the head-related transfer functions (HRTFs)xe2x80x94position-dependent filters which represent the transfer functions of a sound source at a particular position in space to the left and right ears of the listener. The result of this filtering is a two-channel signal that is typically referred to as a binaural signal. This situation is depicted by the prior art illustration at FIG. 1. Here, HI represents the ipsilateral response (loud or near side) and HC represents the contralateral response (quiet or far side) of the human ear. Thus, for a sound source to the right of a listener, the ipsilateral response is the response of the listener""s right ear, whereas the contralateral response is the response of the listener""s left ear. When played back over headphones, the binaural signal will give the listener the perception of a source emanating from the corresponding position in space. Unfortunately, such binaural processing is computationally very demanding, and playback of binaural signals is only possible over headphones, not over loudspeakers.
Presenting a binaural signal directly over a pair of loudspeakers is ineffective, due to loudspeaker crosstalk, i.e., the part of the signal from one loudspeaker which bleeds over to the far ear of the listener and interferes with the signal produced by the other loudspeaker. In order to present a binaural signal over loudspeakers, crosstalk cancellation is required. In crosstalk cancellation, a crosstalk cancellation signal is added to one loudspeaker to cancel the crosstalk which bleeds over from the other loudspeaker. The crosstalk component is computed using the interaural transfer function (ITF), which represents the transfer function from one ear of the listener to the other ear. This crosstalk component is then added, inversely, to one loudspeaker in such a way as to cancel the crosstalk from the opposite loudspeaker at the ear of the listener.
Spatialization of sources for presentation over loudspeakers is computationally very demanding since both binaural processing and crosstalk cancellation must be performed for all sources. FIG. 2 shows a prior art implementation of a positional 3D audio presentation system using HRTF filtering (binaural processing block) and crosstalk cancellation. Based on given positional information, a lookup must be performed for the left and right ears to determine appropriate coefficients to use for HRTF filtering. A mono input source M is then filtered using the left and right ear HRTF filters, which may be FIR or IIR, to produce a binaural signal IB and CB. This binaural signal is then processed by a crosstalk cancellation module 2a to enable playback over loudspeakers. For many applications, this computational burden is too large to be practical for real-time operation. Furthermore, since a different set of HRTFs must be used for each desired source position, the number of filter coefficients which needs to be stored is large, and the use of time-varying filters (in the binaural processing block) is required in order to simulate moving sources.
A prior art approach (U.S. Pat. No. 5,521,981, Louis S. Gehring) to reducing the complexity requirements for 3D audio presentation systems is shown in FIG. 3. In this approach, binaural signals for several source positions are precomputed via HRTF filtering. Typically, these positions are chosen to be front, rear, left, and right. To place a source at a particular azimuth angle, direct interpolation is performed between the binaural signals of the nearest two positions. A disadvantage to this approach, particularly for large source files, is the increase in storage required to store the precomputed binaural signals. Assuming that the HRTFs are symmetric about the median plane (the plane through the center of the head which is normal to line intersecting the two ears), storage requirements for this approach are 4 times that of the original monophonic input signal, i.e., each of the front and the back positions require storage equivalent to the one monophonic input because the contralateral and ispilateral responses are identical, and the left and the right positions can be represented by a binaural pair since the ipsilateral and contralateral response are simply reversed. In addition, presenting the resulting signal over loudspeakers L and R, as opposed to headphones, requires additional computation for the crosstalk cancellation procedure.
In accordance with one embodiment of the present invention, a method and apparatus for the placement of sound sources in three-dimensional space with two loudspeakers is provided by binaural signal processing and loudspeaker crosstalk cancellation, followed by panning into left and right speakers.