The invention relates to an apparatus and a method for generating a plurality of audio channels for a speaker setup.
Spatial audio coding and decoding hardware and software are well known in the art and are, for example, standardized in the MPEG-Surround Standard. Spatial audio systems comprise a number of loudspeakers and respective audio channels, for example a left channel, a center channel, a right channel, a left surround channel, a right surround channel and a low frequency enhancement channel. Each of the channels is usually reproduced by a respective loudspeaker. The placement of the loudspeakers in the output setup is typically fixed and is, for example, dependent on a 5.1 format, a 7.1 format or the like. Dependent on the respective format, a position of the loudspeaker is defined. Some setups define a loudspeaker position above a position of a listener. This loudspeaker is also referred to as a Voice-of-God (VoG). Some formats might also define a loudspeaker with a position below a listener. Respectively, this loudspeaker can be referred to as Voice-of-Hell (VoH). For generating the audio channels defining the audio signals for the loudspeakers of the loudspeaker setup, a Vector Base Amplitude Panning (VBAP) method may be used. VBAP uses a set of N unit vectors l1, . . . , lN which point at the loudspeakers of the speaker set. In case the speaker set is configured to reproduce a 3-dimensional acoustic scene, the speaker set is denoted as a 3D speaker set. A panning direction given by a Cartesian unit vector p is defined by a linear combination of those loudspeaker vectors.p=[l1, . . . ,lN][g1, . . . ,gN]T  (1)
where gn denotes the scaling factor that is applied to ln. In 3, a vector space is formed by 3 vector bases. Hence, (1) can generally be solved by a matrix inversion, if the number of active speakers and thus the number of non-zero scaling factors is limited to 3. Practically, this is done by defining a mesh of triangles between the loudspeakers and by choosing those triplets for the area in between. This can lead to a solution for the scaling factors to be applied in terms of[gn1,gn2,gn3]T=[ln1,ln2,ln3]−1p,  (2)
where {n1,n2,n3} denotes the active loudspeaker triplet. Finally, a normalization, that ensures power-normalized output signals, results in the final panning gains a1, . . . , aN:
                              a          n                =                              g            n                                                                          [                                                      g                    1                                    ,                  …                  ⁢                                                                          ,                                      g                    N                                                  ]                            T                                                                      (        3        )            
The object renderer included in the MPEG-H decoder uses VBAP to render audio objects for a given loudspeaker configuration. If a loudspeaker setup does not include a TO (“Voice-of-God”) loudspeaker, like a 9.1 speaker setup, then objects with a greater elevation than 35° with respect to a position of a listener are limited to an elevation of 35°, the default elevation angle of the upper loudspeakers. While being a practical solution, this solution is clearly not optimal as it may change a reproduced acoustic scene.
In a 9.1 speaker setup, i.e., a speaker setup according to the 9.1 format, the alternative to divide the upper hemisphere into two triangles would result in an asymmetry and an object directly above the listener would then be reproduced by two opposing loudspeakers. As a consequence, an audio object that, for example, moves from the upper front right to the upper rear left would sound different than if it would move from upper front left to upper rear right—despite the symmetry of the speaker setup. A solution to this dilemma is to use N-wise panning where all upper loudspeakers are involved for objects in the upper hemisphere. Extending the VBAP panning from three loudspeakers to N loudspeakers is called N-wise panning. A neighborhood relationship may be given by a graph which is specified by the edges of triangles which would be calculated, for example, by an MPEG decoder. The triangles can be obtained, for example, by forming one or more polyhedrons with N vertices. A vertex may be formed by a speaker. Triangles may be formed out of the outer surfaces of the polyhedrons.
The VBAP panning method necessitates a proper triangulation for all solid angles. In the current MPEG-H 3D reference software, the triangulation is pre-calculated and given in tabulated form for a fixed number of speaker setups. This currently limits the supported speaker setups to the given setups or to setups which differ only by small displacements.
Audio formats defining loudspeaker positions lead the user, e.g. the listener, to place the loudspeakers at those defined positions. Such requirements may be difficult to fulfill, for example, in cases where the loudspeakers are defined to be arranged around a listener as a circle or on a circular path. Some users, especially users living in flats, need to adapt such setups, as a living room with the loudspeaker setup is rectangular instead of circular and users may locate loudspeakers near walls instead of in the middle of a room.
Hence, for example, there is a need for audio decoding concepts, allowing for a more flexible loudspeaker setup.