Teleconferencing systems that are able to deliver a spatial audio scene typically have an advantage over monophonic systems. In particular, teleconferencing systems which deliver a spatial audio scene provide a more compelling experience, since a spatial audio scene allows users to clearly identify who is speaking and what is being said, even in dynamic conversations comprising a plurality of partially concurrent talkers.
A technical problem that appears in the context of designing such teleconferencing systems is the provision of an efficient description of the spatial audio scene. Furthermore, in order to allow for efficient transmission of the description of the spatial audio scene, there is a need for efficient coding algorithms for the particular description of the spatial audio scene. In the present document, a particular class of descriptions of spatial audio scenes is described which involves usage of so-called soundfield signals (e.g., B-format signals, G-format signals, Ambisonics™ signals). The present document focuses on the efficient coding of such soundfield signals.
There are several constraints that are relevant to the design of a coding algorithm for a teleconferencing system. For example, it is typically required that the delay due to the coding is kept relatively low. As a result, coding is typically performed on a per-frame basis, where the frame duration is selected to fit the delay requirement (e.g. 20 ms). In addition, it is often desired to devise a coding algorithm that facilitates independent coding of frames, as this is known to simplify the decoding if there are transmission losses.
A further aspect regarding the design of a coding algorithm is related to the relation and/or trade-off between the operating bit-rate and the resulting perceptual quality. The design goal is usually to reduce (e.g. minimize) the bit-rate, while maintaining at least satisfactory perceptual quality.
The focus of the present document is related to the coding of soundfield signals at low bit-rates (in the range of 24 kbit/s or less per channel of a soundfield signal). In this context a parametric coding scheme for soundfield signals is described, which is a particularly efficient method that provides a reasonable trade-off between the operating bit-rate and the perceptual quality, at relatively low operating bit-rates. Furthermore, the described parametric coding scheme for soundfield signals allows for an improved layered decoding of the encoded soundfield signals, thereby enabling the integration of monophonic terminals into a soundfield teleconferencing system.