Accurate localization is a key goal for any spatial audio reproduction system. Such reproduction systems are highly applicable for conference systems, games, or other virtual environments that benefit from 3D sound. Sound scenes in 3D can be synthesized or captured as a natural sound field. Soundfield signals such as e.g. Ambisonics carry a representation of a desired sound field. A decoding process is required to obtain the individual loudspeaker signals from a sound field representation. Decoding an Ambisonics formatted signal is also referred to as “rendering”. In order to synthesize audio scenes, panning functions that refer to the spatial loudspeaker arrangement are required for obtaining a spatial localization of the given sound source. For recording a natural sound field, microphone arrays are required to capture the spatial information. The Ambisonics approach is a very suitable tool to accomplish this. Ambisonics formatted signals carry a representation of the desired sound field, based on spherical harmonic decomposition of the soundfield. While the basic Ambisonics format or B-format uses spherical harmonics of order zero and one, the so-called Higher Order Ambisonics (HOA) uses also further spherical harmonics of at least 2nd order. The spatial arrangement of loudspeakers is referred to as loudspeaker setup. For the decoding process, a decode matrix (also called rendering matrix) is required, which is specific for a given loudspeaker setup and which is generated using the known loudspeaker positions.
Commonly used loudspeaker setups are the stereo setup that employs two loudspeakers, the standard surround setup that uses five loudspeakers, and extensions of the surround setup that use more than five loudspeakers. However, these well-known setups are restricted to two dimensions (2D), e.g. no height information is reproduced. Rendering for known loudspeaker setups that can reproduce height information has disadvantages in sound localization and coloration: either spatial vertical pans are perceived with very uneven loudness, or loudspeaker signals have strong side lobes, which is disadvantageous especially for off-center listening positions. Therefore, a so-called energy-preserving rendering design is preferred when rendering a HOA sound field description to loudspeakers. This means that rendering of a single sound source results in loudspeaker signals of constant energy, independent of the direction of the source. In other words, the input energy carried by the Ambisonics representation is preserved by the loudspeaker renderer. The International patent publication WO2014/012945A1 [1] from the present inventors describes a HOA renderer design with good energy preserving and localization properties for 3D loudspeaker setups. However, while this approach works quite well for 3D loudspeaker setups that cover all directions, some source directions are attenuated for 2D loudspeaker setups (like e.g. 5.1 surround). This applies especially for directions where no loudspeakers are placed, e.g. from the top.
In F. Zotter and M. Frank, “All-Round Ambisonic Panning and Decoding” [2], an “imaginary” loudspeaker is added if there is a hole in the convex hull built by the loudspeakers. However, the resulting signal for that imaginary loudspeaker is omitted for playback on the real loudspeaker. Thus, a source signal from that direction (i.e. a direction where no real loudspeaker is positioned) will still be attenuated. Furthermore, that paper shows the use of the imaginary loudspeaker for use with VBAP (vector base amplitude panning) only.