A head-response transfer function (HRTF) is a processing technology for sound localization, which measures change data obtained, through human ears, for sounds at different azimuths, makes statistics, and performs calculation to obtain a human ear perception model.
With two ears, human can locate a sound from a three-dimensional space because of an analysis system of the human ears for a sound signal. Such a signal analysis system obtains sound localization information by means of the filtering effect of the human body on a sound wave. Theoretically, the human ear perception and localization of a sound in the real three-dimensional world can be precisely simulated by measuring the effects imposed on the sound by processing such as filtering and delaying of the human body and then simulating the effects of the human body when a playback system (an earphone or speaker) is used to play the sound.
The measurement is currently accomplished by making an artificial head. A prosthesis highly simulating the density and material of the human body is used, an audio difference in sound reception of a sound wave at a fixed azimuth transmitted to prosthetic ears of the artificial head is recorded, and an HRTF is obtained through statistics.
The measured data includes: an inter aural time delay (ITD), an inter aural amplitude difference (IAD), an inter intensity difference (IID), and spectral cues.
In existing 3D audio effect processing, if sound card hardware has an HRTF computing chip, when modulating and playing a sound source to a sound recipient, a three-dimensional engine uses an HRTF to perform three-dimensional localization processing, so as to achieve a realistic three-dimensional localization effect for the sound recipient.
In the existing 3D audio effect processing, HRTF computing can be performed by using a computing chip possessing an HRTF, so that a user gets three-dimensional localization experience in the sense of hearing. However, because HRTF databases of the hardware are basically derived from measurement of a real environment, they are not suitable for sound processing in some virtual scenes. For example, in a virtual scene of a game program, to achieve a specific visual effect, some exaggeration means are usually used in composition, so that the volume proportion of objects may differ from the actual volume proportion in the real world. For another example, in a first person shooting game, a player requires exaggerated sounds of footsteps for localization, so as to determine locations of other players. However, sound localization performed in such virtual scenes by the HRTF supported by the existing hardware has undesirable effect most of the time.
There is no effective solution to the foregoing problem currently.