A binaural rendering technology is essentially required to provide immersive and interactive audio in a head mounted display (HMD) device. Binaural rendering represents modeling a 3D audio, which provides a sound that gives a sense of presence in a three-dimensional space, into a signal to be delivered to the ears of a human being. A listener may be experienced a sense of three-dimensionality from a binaural rendered 2-channel audio output signal through a headphone, an earphone, or the like. A specific principle of the binaural rendering is described as follows. A human being listens to a sound through both ears, and recognizes the position and the direction of a sound source from the sound. Therefore, if a 3D audio may be modeled into audio signals to be delivered to both ears of a human being, the three-dimensionality of 3D audio may be reproduced through a 2-channel audio output without a large number of speakers.
Here, when the number of channels or objects included in an audio signal to be binaural rendered increases, the amount of calculation and power consumption required for binaural rendering may be increased. Therefore, a technology for efficiently performing binaural rendering on an input audio signal is required in a mobile device limited in calculation amount and power consumption.
Furthermore, the number of head related transfer functions (HRTFs) obtainable by the audio signal processing device may be limited due to limited memory capacity and constraints in the measurement process. This may cause degradation of the sound localization performance of the audio signal processing device. Therefore, additional processing of the audio signal processing device for the input HRTF may be required to increase the communicative resolution of the audio signal being reproduced on the three-dimensional space. In addition, a binaural rendered audio signal in a virtual reality may be combined with additional signals to improve reproducibility. In this case, when the audio signal processing device synthesizes the binaural rendered audio signal and the additional signal in time domain, the sound quality of the output audio signal may be degraded due to a comb-filtering effect. This is because timbre may be distorted due to binaural rendering and the different delays of additional signals. Further, when the audio signal processing device synthesizes the binaural-rendered audio signal and the additional signal in frequency domain, an additional amount of computation is required as compared with the case of using only binaural rendering. There is thus a need for techniques to preserve the timbre of an input audio signal while reducing the amount of computation in further processing and synthesis.