This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art, unless a source is expressly mentioned.
Accurate localisation is a key goal for any spatial audio reproduction system. Such reproduction systems are highly applicable for conference systems, games, or other virtual environments that benefit from 3D sound. Sound scenes in 3D can be synthesised or captured as a natural sound field. Soundfield signals such as e.g. Ambisonics carry a representation of a desired sound field. The Ambisonics format is based on spherical harmonic decomposition of the soundfield. While the basic Ambisonics format or B-format uses spherical harmonics of order zero and one, the so-called Higher Order Ambisonics (HOA) uses also further spherical harmonics of at least 2nd order. A decoding process is required to obtain the individual loudspeaker signals. To synthesise audio scenes, panning functions that refer to the spatial loudspeaker arrangement, are required to obtain a spatial localisation of the given sound source. If a natural sound field should be recorded, microphone arrays are required to capture the spatial information. The known Ambisonics approach is a very suitable tool to accomplish it. Ambisonics formatted signals carry a representation of the desired sound field. A decoding process is required to obtain the individual loudspeaker signals from such Ambisonics formatted signals. Since also in this case panning functions can be derived from the decoding functions, the panning functions are the key issue to describe the task of spatial localisation. The spatial arrangement of loudspeakers is referred to as loudspeaker setup herein.
Commonly used loudspeaker setups are the stereo setup, which employs two loudspeakers, the standard surround setup using five loudspeakers, and extensions of the surround setup using more than five loudspeakers. These setups are well known. However, they are restricted to two dimensions (2D), e.g. no height information is reproduced.
Loudspeaker setups for three dimensional (3D) playback are described for example in “Wide listening area with exceptional spatial sound quality of a 22.2 multichannel sound system”, K. Hamasaki, T. Nishiguchi, R. Okumaura, and Y. Nakayama in Audio Engineering Society Preprints, Vienna, Austria, May 2007, which is a proposal for the NHK ultra high definition TV with 22.2 format, or the 2+2+2 arrangement of Dabringhaus (mdg-musikproduktion dabringhaus and grimm, www.mdg.de) and a 10.2 setup in “Sound for Film and Television”, T. Holman in 2nd ed. Boston: Focal Press, 2002. One of the few known systems referring to spatial playback and panning strategies is the vector base amplitude panning (VBAP) approach in “Virtual sound source positioning using vector base amplitude panning,” Journal of Audio Engineering Society, vol. 45, no. 6, pp. 456-466, June 1997, herein Pulkki. VBAP (Vector Base Amplitude Panning) has been used by Pulkki to play back virtual acoustic sources with an arbitrary loudspeaker setup. To place a virtual source in a 2D plane, a pair of loudspeakers is required, while in a 3D case loudspeaker triplets are required. For each virtual source, a monophonic signal with different gains (dependent on the position of the virtual source) is fed to the selected loudspeakers from the full setup. The loudspeaker signals for all virtual sources are then summed up. VBAP applies a geometric approach to calculate the gains of the loudspeaker signals for the panning between the loudspeakers.
An exemplary 3D loudspeaker setup example considered and newly proposed herein has 16 loudspeakers, which are positioned as shown in FIG. 2. The positioning was chosen due to practical considerations, having four columns with three loudspeakers each and additional loudspeakers between these columns. In more detail, eight of the loudspeakers are equally distributed on a circle around the listener's head, enclosing angles of 45 degrees. Additional four speakers are located at the top and the bottom, enclosing azimuth angles of 90 degrees. With regard to Ambisonics, this setup is irregular and leads to problems in decoder design, as mentioned in “An ambisonics format for flexible playback layouts,” by H. Pomberger and F. Zotter in Proceedings of the 1st Ambisonics Symposium, Graz, Austria, July 2009.
Conventional Ambisonics decoding, as described in “Three-dimensional surround sound systems based on spherical harmonics” by M. Poletti in J. Audio Eng. Soc., vol. 53, no. 11, pp. 1004-1025, November 2005, employs the commonly known mode matching process. The modes are described by mode vectors that contain values of the spherical harmonics for a distinct direction of incidence. The combination of all directions given by the individual loudspeakers leads to the mode matrix of the loudspeaker setup, so that the mode matrix represents the loudspeaker positions. To reproduce the mode of a distinct source signal, the loudspeakers' modes are weighted in that way that the superimposed modes of the individual loudspeakers sum up to the desired mode. To obtain the necessary weights, an inverse matrix representation of the loudspeaker mode matrix needs to be calculated. In terms of signal decoding, the weights form the driving signal of the loudspeakers, and the inverse loudspeaker mode matrix is referred to as “decoding matrix”, which is applied for decoding an Ambisonics formatted signal representation. In particular, for many loudspeaker setups, e.g. the setup shown in FIG. 2, it is difficult to obtain the inverse of the mode matrix.
As mentioned above, commonly used loudspeaker setups are restricted to 2D, i.e. no height information is reproduced. Decoding a soundfield representation to a loudspeaker setup with mathematically non-regular spatial distribution leads to localization and coloration problems with the commonly known techniques. For decoding an Ambisonics signal, a decoding matrix (i.e. a matrix of decoding coefficients) is used. In conventional decoding of Ambisonics signals, and particularly HOA signals, at least two problems occur. First, for correct decoding it is necessary to know signal source directions for obtaining the decoding matrix. Second, the mapping to an existing loudspeaker setup is systematically wrong due to the following mathematical problem: a mathematically correct decoding will result in not only positive, but also some negative loudspeaker amplitudes. However, these are wrongly reproduced as positive signals, thus leading to the above-mentioned problems.