New professional and consumer-level audio-visual (AV) systems (such as the Dolby® Atmos™ system) have been developed to render hybrid audio content using a format that includes both audio beds (channels) and audio objects. Audio beds refer to audio channels that are meant to be reproduced in predefined, fixed speaker locations (e.g., 5.1 or 7.1 surround) while audio objects refer to individual audio elements that exist for a defined duration in time and have spatial information describing the position, velocity, and size (as examples) of each object. During transmission beds and objects can be sent separately and then used by a spatial reproduction system to recreate the artistic intent using a variable number of speakers in known physical locations. Based on the capabilities of an authoring system there may be tens or even hundreds of individual audio objects (static and/or time-varying) that are combined during rendering to create a spatially diverse and immersive audio experience. In an embodiment, the audio processed by the system may comprise channel-based audio, object-based audio or object and channel-based audio. The audio comprises or is associated with metadata that dictates how the audio is rendered for playback on specific devices and listening environments. In general, the terms “hybrid audio” or “adaptive audio” are used to mean channel-based and/or object-based audio signals plus metadata that renders the audio signals using an audio stream plus metadata in which the object positions are coded as a three-dimensional (3D) position in space.
Adaptive audio systems thus represent the sound scene as a set of audio objects in which each object is comprised of an audio signal (waveform) and time varying metadata indicating the position of the sound source. Playback over a traditional speaker set-up such as a 7.1 arrangement (or other surround sound format) is achieved by rendering the objects to a set of speaker feeds. The process of rendering comprises in large part (or solely) a conversion of the spatial metadata at each time instant into a corresponding gain matrix, which represents how much of each of the object feeds into a particular speaker. Thus, rendering “N” audio objects to “M” speakers at time “t” (t) can be represented by the multiplication of a vector x(t) of length “N”, comprised of the audio sample at time t from each object, by an “M-by-N” matrix A(t) constructed by appropriately interpreting the associated position metadata (and any other metadata such as object gains) at time t. The resultant samples of the speaker feeds at time tare represented by the vector y(t). This is shown below in Eq. 1:
                                          [                                                                                                      y                      0                                        ⁡                                          (                      t                      )                                                                                                                                                              y                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        y                                              M                        -                        1                                                              ⁡                                          (                      t                      )                                                                                            ]                                y            ⁡                          (              t              )                                      =                                            [                                                                                                                  a                        00                                            ⁡                                              (                        t                        )                                                                                                                                                a                        01                                            ⁡                                              (                        t                        )                                                                                                                                                a                        02                                            ⁡                                              (                        t                        )                                                                                                  …                                                                                                      a                                                  0                          ,                                                      N                            -                            1                                                                                              ⁡                                              (                        t                        )                                                                                                                                                                                a                        10                                            ⁡                                              (                        t                        )                                                                                                  ⋮                                                        ⋮                                                        ⋮                                                        ⋮                                                                                        ⋮                                                        ⋮                                                        ⋮                                                        ⋮                                                        ⋮                                                                                                                                      a                                                                              M                            -                            1                                                    ,                          0                                                                    ⁡                                              (                        t                        )                                                                                                  ⋮                                                        ⋮                                                        ⋮                                                                                                      a                                                                              M                            -                            1                                                    ,                                                      N                            -                            1                                                                                              ⁡                                              (                        t                        )                                                                                                        ]                                      A              ⁡                              (                t                )                                              ⁢                                                                 [                                                                                                                              x                          0                                                ⁡                                                  (                          t                          )                                                                                                                                                                                                  x                          1                                                ⁡                                                  (                          t                          )                                                                                                                                                                                                  x                          2                                                ⁡                                                  (                          t                          )                                                                                                                                                ⋮                                                                                                  ⋮                                                                                                                                                    x                                                      N                            -                            1                                                                          ⁡                                                  (                          t                          )                                                                                                                    ]                                            x                ⁡                                  (                  t                  )                                                                                        (                  Eq          .                                          ⁢          1                )            
The matrix equation of Eq. 1 above represents an adaptive audio (e.g., Atmos) rendering perspective, but it can also represent a generic set of scenarios where one set of audio samples is converted to another set by linear operations. In an extreme case A(t) is a static matrix and may represent a conventional downmix of a set of audio channels x(t) to a fewer set of channels y(t). For instance, x(t) could be a set of audio channels that describe a spatial scene in an Ambisonics format, and the conversion to speaker feeds y(t) may be prescribed as multiplication by a static downmix matrix. Alternatively, x(t) could be a set of speaker feeds for a 7.1 channel layout, and the conversion to a 5.1 channel layout may be prescribed as multiplication by a static downmix matrix.
To provide audio reproduction that is as accurate as possible, adaptive audio systems are often used with high-definition audio codecs (coder-decoder) systems, such as Dolby TrueHD. As an example of such codecs, Dolby TrueHD is an audio codec that supports lossless and scalable transmission of audio signals. The source audio is encoded into a hierarchy of substreams where only a subset of the substreams need to be retrieved from the bitstream and decoded, in order to obtain a lower dimensional (or downmix) presentation of the spatial scene, and when all the substreams are decoded the resultant audio is identical to the source audio. Although embodiments may be described and illustrated with respect to TrueHD systems, it should be noted that any other similar HD audio codec system may also be used. The term “TrueHD” is thus meant to include all possible HD type codecs. Technical details of Dolby TrueHD, and the Meridian Lossless Packing (MLP) technology on which it is based, are well known. Aspects of TrueHD and MLP technology are described in U.S. Pat. No. 6,611,212, issued Aug. 26, 2003, and assigned to Dolby Laboratories Licensing Corp., and the paper by Gerzon, et al., entitled “The MLP Lossless Compression System for PCM Audio,” J. AES, Vol. 52, No. 3, pp. 243-260 (March 2004).
The TrueHD format supports specification of downmix matrices. In typical use, the content creator of a 7.1 channel audio program specifies a static matrix to downmix the 7.1 channel program to a 5.1 channel mix, and another static matrix to downmix the 5.1 channel downmix to a 2 channel (stereo) downmix. Each static downmix matrix may be converted to a sequence of downmix matrices (each matrix in the sequence for downmixing a different interval in the program) in order to achieve clip-protection. However, each matrix in the sequence (or metadata determining each matrix in the sequence) is transmitted to the decoder, and the decoder does not perform interpolation on any previously specified downmix matrix to determine a subsequent matrix in a sequence of downmix matrices for a program.
Given a downmix matrix specification (e.g., a static specification A that is 2*3 in dimension), the objective of the encoder is to design the output matrices (and hence the input matrices), and output channel assignments (and hence the input channel assignment) so that the resultant internal audio is hierarchical, i.e., the first two internal channels are sufficient to derive the 2-channel presentation, and so on; and the matrices of the top most substream are exactly invertible so that the input audio is exactly retrievable. However, it should be noted that computing systems work with finite precision and inverting an arbitrary invertible matrix exactly often requires very large precision calculations. Thus, downmix operations using TrueHD codec systems generally require a large number of bits to represent matrix coefficients.
What is needed, therefore, is an HD codec system that performs down- and up-mixing operations without requiring large precision calculations in order to prevent the use of large numbers of bits to represent matrix coefficients in rendering adaptive audio content.
What is further needed is a system that enables the transmission of adaptive audio content (e.g., Dolby Atmos) via high-definition codec formats (e.g., Dolby TrueHD), with a substream structure that supports decoding some standard downmixes (e.g., 2 ch, 5.1 ch, 7.1 ch) by legacy devices, while support for decoding lossless adaptive audio may be available only in new decoding devices.
Certain high-definition audio formats, such as TrueHD may address the problem of requiring large precision calculations by constraining the output matrices (and input matrices) to be of the type denoted “primitive matrices.” What is yet further needed, however, is a method of decomposing downmix specification matrices into primitive matrices with coefficient values that do not exceed the syntax constraints of the audio processing system.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. Dolby, Dolby TrueHD, and Atmos are trademarks of Dolby Laboratories Licensing Corporation.