The present invention relates to binaural audio synthesis.
3D audio or binaural synthesis may refer to a technique used to process audio in such a way that a sound may be positioned anywhere in 3D space. The positioning of sounds in 3D space may give a user the effect of being able to hear a sound over a pair of headphones, or from another source, as if it came from any direction (for example, above, below or behind). 3D audio or binaural synthesis may be used in applications such as games, virtual reality or augmented reality to enhance the realism of computer-generated sound effects supplied to the user.
When a sound comes from a source far away from a listener, the sound received by each of the listener's ears may, for example, be affected by the listener's head, outer ears (pinnae), shoulders and/or torso before entering the listener's ear canals. For example, the sound may experience diffraction around the head and/or reflection from the shoulders.
If the source is to one side of the listener, the sound received from the source may be received at different times by the left and right ears. The time difference between the sound received at the left and right ears may be referred to as an Interaural Time Delay (ITD). The amplitude of the sound received by the left and right ears may also differ. The difference in amplitude may be referred to as an Interaural Level Difference (ILD).
Binaural synthesis may aim to process monaural sound (a single channel of sound) into binaural sound (a channel for each ear, for example a channel for each headphone of a set of headphones) such that it appears to a listener that sounds originate from sources at different positions relative to the listener, including sounds above, below and behind the listener.
A head-related transfer function (HRTF) is a transfer function that may capture the effect of the human head (and optionally other anatomical features) on sound received at each ear. The information of the HRTF may be expressed in the time domain through the head-related impulse response (HRIR). Binaural sound may be obtained by applying an HRIR to a monaural sound input.
It is known to obtain an HRTF (and/or an HRIR) by measuring sound using two microphones placed at ear positions of an acoustic manikin. The acoustic manikin may provide a representative head shape and ear spacing and, optionally, the shape of representative pinnae, shoulders and/or torso.
Methods are known in which finite impulse response (FIR) filter coefficients are generated from HRIR measurements. The HRIR-generated FIR coefficients are convolved with an input audio signal to synthesise binaural sound. A FIR filter generated from HRIR measurements may be a high-order filter, for example a filter of between 128 and 512 taps. An operation of convolving the FIR filter with an input audio signal may be computationally intensive, particularly when the relative positions of the source and the listener change over time.
It has been suggested to approximate an HRIR using a computational model, for example a structural model. A structural model may simulate the effect of a listener's body on sound received by the listener's ears. In one such structural model, effects of the head, pinnae and shoulders are modelled. The structural model combines an infinite impulse response (IIR) head-shadow model with an FIR pinna-echo model and an FIR shoulder-echo model.