The present invention relates to an audio signal processing method and device. More specifically, the present invention relates to an audio signal processing method and device for processing an audio signal expressible as an ambisonic signal.
3D audio commonly refers to a series of signal processing, transmission, encoding, and playback techniques for providing a sound which gives a sense of presence in a three-dimensional space by providing an additional axis corresponding to a height direction to a sound scene on a horizontal plane (2D) provided by conventional surround audio. In particular, 3D audio requires a rendering technique for forming a sound image at a virtual position where a speaker does not exist even if a larger number of speakers or a smaller number of speakers than that for a conventional technique are used.
3D audio is expected to become an audio solution to an ultra high definition TV (UHDTV), and is expected to be applied to various fields of theater sound, personal 3D TV, tablet, wireless communication terminal, and cloud game in addition to sound in a vehicle evolving into a high-quality infotainment space.
Meanwhile, a sound source provided to the 3D audio may include a channel-based signal and an object-based signal. Furthermore, the sound source may be a mixture type of the channel-based signal and the object-based signal, and, through this configuration, a new type of listening experience may be provided to a user.
An ambisonic signal may be used to provide a scene-based immersive sound. In particular, an higher order ambisonics (HoA) signal may be used to give a vivid sense of presence. In the case where the HoA signal is used, a sound acquisition procedure is simplified. Furthermore, in the case where the HoA signal is used, an audio scene of an entire three-dimensional space may be efficiently reproduced. Accordingly, an HoA signal processing technology may be useful for virtual reality (VR) for which a sound that gives a sense of presence is important. However, according to the HoA signal processing technology, it is difficult to accurately represent a location of an individual sound object within an audio scene.