Coding formats for which any inter-frame dependencies are absent or can be ignored at decoding and for which the coded signals can be mixed directly in the transform domain are known in the art. If a fixed transform window is used, then typically direct mixing is possible. Use of a fixed transform window also has the advantage that a mixing operation requires a reduced computational load and does not add algorithmic delay.
However, advantageous coding formats of this type are known only for single-channel mono audio signals. It would be desirable to extend their use to sound field signals, e.g., signals in a spatial sound field captured by an array of three or more microphones, artificially generated sound field signals, or signals converted into a sound field format, such as B-format, G-format, Ambisonics™ and the like. This would for instance enable a richer representation of the participants in a teleconference, including their spatial properties such as direction of arrival and room reverb. Straightforward approaches, e.g., using one encoder for each input signal to be encoded and letting these operate independently in parallel, will not be competitive as far as coding efficiency is concerned.
Hence, it would be desirable to propose a surround audio format that allows lower coding rates in a multichannel system while maintaining spatial properties and high overall audio quality.
All the figures are schematic and generally only show parts which are necessary in order to elucidate the invention, whereas other parts may be omitted or merely suggested. Unless otherwise indicated, like reference numerals refer to like parts in different figures.