Multichannel audio is widespread and has become popular for many different applications including home cinema and multi-channel music systems. Audio encoding is often used to generate data streams that provide an efficient data representation of the audio signals. Such audio encoding allows an efficient storage and distribution of audio signals. Many different audio encoding standards have been developed for encoding and decoding of both traditional mono and stereo audio signals, as well as for encoding and decoding of multichannel audio signals. The term multichannel is henceforth used to refer to more than two channels. The use of dedicated audio standards allows for interworking and compatibility between many different systems, devices and applications, and it is therefore critical that efficient standards are adhered to. However, a significant problem arises when new standards are developed or existing standards are modified. In particular, modifications to standards may not only be time consuming and cumbersome to carry out but may also result in existing equipment not being suitable for the new or indeed for the existing standards. In order to facilitate introduction of new standards or standard modifications, it is desirable that these require as little modification to existing standards as possible. In some cases it is even possible to make modifications that are fully compatible with the existing standards, i.e. the modifications can be applied without any change to the existing standard specification. An example of this is bitstream watermarking. In bitstream watermarking specific bitstream elements are modified in a compatible fashion such that the bitstream can still be decoded according to the standard specification. Although the output has changed, the difference in quality is generally not audible.
MPEG Surround is one of the major advances in multi-channel audio coding and was recently standardized by Motion Picture Experts Group in ISO/IEC 23003-1. MPEG Surround is a multi-channel audio coding tool that allows existing mono- or stereo-based services to be extended to multi-channel applications. FIG. 1 shows a block diagram of a stereo core coder extended with MPEG Surround. First the MPEG Surround encoder creates a stereo downmix from the multi-channel input signal. Next, spatial parameters are estimated from the multi-channel input signal. These parameters are encoded into the MPEG Surround bit-stream. The stereo downmix is coded into a bit-stream using a core encoder, e.g. HE-AAC. The resulting core coder bit-stream and the spatial bit-stream are merged to create the overall bit-stream. Typically the spatial bit-stream is contained in the ancillary data or user data portion of the core coder bit-stream. At the decoder side the core and spatial bit-stream are separated. The stereo core bit-stream is decoded in order to reproduce the stereo downmix. This downmix together with the spatial bit-stream is input to the MPEG Surround decoder. The spatial bit-stream is decoded to provide the spatial parameters. The spatial parameters are then used to upmix the stereo downmix in order to obtain the multi-channel output signal.
Since the spatial image of the multi-channel input signal is parameterized, MPEG Surround allows for decoding of the same multi-channel bit-stream onto rendering devices other than a multichannel speaker setup. An example is virtual surround reproduction on headphones, which is referred to as the MPEG Surround binaural decoding process. In this mode, a realistic surround experience can be provided using regular headphones. FIG. 2 shows a block diagram of the stereo core codec extended with MPEG Surround where the output is decoded to binaural. The encoder process is identical to that of FIG. 1. In the system, the spatial parameters are combined with the Head Related Transfer Function (HRTF) and the result is used to produce the so-called binaural output.
Building upon the concept of MPEG Surround, MPEG has standardized a system for encoding of individual audio objects. This standard is known as ‘Spatial Audio Object Coding’ (MPEG-D SAOC) ISO/IEC 23003-2. From a high level perspective, SAOC efficiently encodes sound objects instead of audio channels where each sound object may typically correspond to a single sound source in the sound image. In MPEG Surround, each speaker channel can be considered to originate from a different mix of sound objects whereas in SAOC data is provided for the individual sound objects. Similarly to MPEG Surround, a mono or stereo downmix is also created in SAOC. Specifically, SAOC also generates a mono or stereo downmix which is coded using a standard downmix coder such as HE-AAC. In this way, legacy playback devices will disregard the parametric data and play the mono or stereo downmix whereas SAOC decoders can upmix the signal to retrieve the original sound objects or to allow them to be rendered in a desired output configuration. Object and downmix parameters are embedded in the ancillary data portion of the downmix coded bitstream to provide relative level and gain information for the individual SAOC objects, typically reflecting the downmix of these into the stereo/mono downmix. At the decoder side, the user can control various features of the individual objects (such as spatial position, amplification, and equalization) by manipulating these parameters, or the user can apply effects, such as reverb, to individual objects.
FIG. 3 shows a block-diagram for regular SAOC encoding. The SAOC encoder can be considered to be a preprocessing module situated before a conventional mono- or stereo encoder. The preprocessing consists of generating a stereo (or mono) downmix from a number N of object signals. Additionally object parameters are extracted and stored in an SAOC bitstream together with information on the downmix matrix M. The SAOC downmix information is encoded in two types of parameters. First the DMG (downmix gain) parameter indicates the gain applied to the object. The DCLD (downmix channel level difference) parameter signals the distribution of the object over the two channels in a stereo downmix. These parameters are both defined per object.
A SAOC decoder may perform the opposite operation. The received mono- or stereo downmix may be decoded and upmixed to a desired output configuration. The upmix operation includes the combined operation of an upmixing of the mono- or stereo downmix to generate the audio objects followed by a mapping of these to the desired output configuration based on a rendering matrix as illustrated in FIG. 4, where the mono or stereo input downmix is first upmixed to N audio objects based on the SAOC parameters. The resulting N audio objects are then downmixed to P output channels using a rendering matrix defining where the individual objects are positioned. FIG. 4 illustrates the conceptual SAOC decoding. However, typically the upmix matrix and the rendering matrix are combined into a single matrix and the generation of the output channels from the mono- or stereo downmix is performed as a single operation. An example thereof is shown in FIG. 5 which shows a specific example wherein P equals one or two, and where specifically for P=2 the output may be a binaural spatial output channel. Thus, the two output channels are generated using HRTF parameters applied to the individual objects to generate the desired binaural spatial image. FIG. 9 illustrates an example where P>2 and an MPEG Surround (MPS) decoding/processing is used to generate the P output channels.
However, an issue associated with SAOC is that the specification only supports mono- and stereo downmixes whereas there are a number of applications and use-cases in which multi-channel mixes are used or even sometimes required, for instance DVD and Blu-Ray. It would therefore be desirable for SAOC to support such multi-channel applications, i.e. a multichannel downmix, but this would require substantial amendments to the SAOC standard specification which would be cumbersome, impractical, increase complexity and result in reduced backwards compatibility.
In particular, it would be advantageous if existing algorithms, functional units, dedicated hardware etc. developed for SAOC encoding and decoding could be reused while allowing improved support for multichannel audio.
Hence, an improved approach for object encoding and/or decoding (such as e.g. SAOC encoding/decoding) would be advantageous and in particular approaches allowing increased flexibility, reduced impact on standardised approaches, increased or facilitated backwards compatibility, allowing increased reuse of encoding and/or decoding functionality, facilitated implementation, multichannel support in object encoding, and/or improved performance would be advantageous.