In recent years, the development and dissemination of systems, which record, transmit, and reproduce spatial information from the entire environment, have been progressing in the field of sound. For example, in Super Hi-Vision, broadcasting is being planned with three-dimensional multi-channel acoustics of 22.2 ch.
Also in the field of virtual reality, ones which reproduce signals surrounding the entire environment for sound in addition to pictures surrounding the entire environment have started to be spread.
Among them, there is a technique called Ambisonics, which expresses three-dimensional audio information flexibly adaptable to an arbitrary recording/reproducing system and is attracting attention. In particular, Ambisonics which has degrees equal to or higher than the second-order is called higher order Ambisonics (HOA) (e.g., see Non-Patent Document 1).
In the three-dimensional multi-channel acoustics, sound information spreads along the spatial axis in addition to the time axis. And in Ambisonics, information is kept by performing frequency transform, that is, spherical harmonic transform on the angular direction of three-dimensional polar coordinates. The spherical harmonic transform can be considered to be equivalent to time-frequency transform on the audio signal about the time axis.
An advantage of this method is that information can be encoded and decoded from an arbitrary microphone array to an arbitrary speaker array without limiting the number of microphones or the number of speakers.
On the other hand, the factors that impede the spread of Ambisonics include the need for a speaker array including a large number of speakers in the reproduction environment, and the narrow range of reproducing the sound space (sweet spot).
For example, to try to increase the spatial resolution of sound, a speaker array including more speakers is necessary, but it is unrealistic to create such a system at home or the like. In addition, in a space like a movie theater, the area where the sound space can be reproduced is narrow, and it is difficult to give desired effects to all the audience.