Sound is gaining increasing interest as an element of user interfaces in a variety of different environments. Examples of the various uses of sound include human/computer interfaces, auditory aids for the visually impaired, virtual reality systems, acoustic and auditory information displays, and teleconferencing. To date, sound is presented to the user in each of these different environments by means of headphones or a limited number of loudspeakers. In most of these situations, the sounds perceived by the user have limited spatial characteristics. Typically, the user is able to distinguish between two dipolar sources, e.g. left and right balance, but is otherwise unable to distinguish between different virtual sources of sounds that are theoretically located at a variety of different positions, relative to the user.
It is desirable to utilize the three-dimensional aspect of sound, to enhance the user experience in these various environments, as well as provide a greater amount of information. Unlike vision, the user's aural input is not limited to the direction in which he or she is looking at a given instant. Rather, the human auditory system permits individuals to identify and discriminate between sources of information from all surrounding locations. Consequently, efforts have been directed to the accurate synthesis of three-dimensional spatial sound which permits the user to distinguish between multiple different sources of information.
To accurately synthesize sound in a virtual three-dimensional environment, one factor which must be taken into account is the position-dependent changes that occur when a sound wave propagates from a sound source to the listener's eardrum. These changes result from diffraction of the sound wave by the torso, head and ears of the listener. Such diffractions are in turn influenced by the azimuth, elevation and range of the listener relative to the source. The changes in sounds which occur by these influencing factors as they travel from the source to the listener's eardrum can be quantified in a transfer function known as the head-related transfer function (HRTF). In general, the HRTF can be characterized as a table of finite impulse responses which is indexed according to azimuth and elevation, as well as range in some cases. The HRTF has become a valuable tool in the characterization of acoustic information, and therefore widely employed in various types of research that are directed to sound localization in a three-dimensional environment.
Since the HRTF is highly dependent upon the physique of the listener, particularly the size of the head, neck and shoulders, and the shapes of the outer ears, or pinnae, it can vary significantly from one person to the next. As a result, the HRTF is sufficiently unique to an individual that appreciable errors can occur if one person listens to sound that is synthesized or filtered in accordance with a different person's HRTF. To provide truly accurate spatial sound for a given individual, therefore, it is necessary to employ an HRTF which is appropriate to that individual. In an environment which is confined to a limited number of listeners, it might be feasible to explicitly determine the HRTF for each potential user. Typically, this is carried out by measuring the response at the listener's eardrums to a number of different signals from sound sources at different locations, by means of probe microphones that are placed within the listener's ears, as close as possible to the eardrum. Using this technique, it is possible to obtain an HRTF that is specific to each individual. For further information regarding the measurement of an HRTF, see Blauert, J., Spatial Hearing, MIT Press, 1983, particularly at Section 2.2, the disclosure of which is incorporated herein by reference.
While this direct measurement approach may be feasible for a limited number of users, it will be appreciated that it is not practical for applications designed to be used by a large number of listeners. Accordingly, efforts have been undertaken to model the HRTF, and thereafter compute an HRTF for a given individual from the model. To date, much of the effort at modeling the HRTF has focused upon principle components analysis. For a detailed discussion of this approach, reference is made to Kistler et al, “A Model of Head-Related Transfer Functions Based On Principle Components Analysis and Minimum-Phase Reconstruction,” J. Acoust. Soc. Am. 91(3), March 1992, pages 1637–1647.
These attempts to characterize the HRTF have met with limited success, since they only provide a rough basis for an estimation model, but do not actually couple characteristics of the listener to his or her HRTF. Consequently, the principle components analysis does not provide a mechanism to find the best HRTF for a given user. Other attempts have been made to model the HRTF on the basis of the physics of sound propagation. See, for example, C. P. Brown and R. O. Duda, “A Structural Model for Binaural Sound Synthesis,” IEEE Trans. Speech and Audio Processing, Vol. 6, No. 5, pp. 476–488 (September 1998). While this approach appears to provide more accurate results, the need to obtain the necessary physical measurements can be inconvenient and time consuming, and therefore may not be practical in all situations. In addition, the physical principles that determine the HRTF are not all known, and therefore the model may not be truly representative. It is therefore desirable to provide an accurate technique for estimating the HRTF of an individual on the basis of a limited amount of input information, particularly where direct measurement of the individual is not always possible or feasible.