Dolby and Dolby TrueHD are trademarks of Dolby Laboratories Licensing Corporation.
The complexity, and financial and computational cost, of rendering audio programs increases with the number of channels to be rendered. During rendering and playback of object based audio programs, the audio content has a number of channels (e.g., object channels and speaker channels) which is typically much larger (e.g., by an order of magnitude) than the number occurring during rendering and playback of conventional speaker-channel based programs. Typically also, the speaker system used for playback includes a much larger number of speakers than the number employed for playback of conventional speaker-channel based programs.
Although embodiments of the invention are useful for rendering channels of any multichannel audio program, many embodiments of the invention are especially useful for rendering channels of object-based audio programs having a large number of channels.
It is known to employ playback systems (e.g., in movie theaters) to render object based audio programs. Object based audio programs may be indicative of many different audio objects corresponding to images on a screen, dialog, noises, and sound effects that emanate from different places on (or relative to) the screen, as well as background music and ambient effects (which may be indicated by speaker channels of the program) to create the intended overall auditory experience. Accurate playback of such programs requires that sounds be reproduced in a way that corresponds as closely as possible to what is intended by the content creator with respect to audio object size, position, intensity, movement, and depth.
During generation of object based audio programs, it is typically assumed that the loudspeakers to be employed for rendering are located in arbitrary locations in the playback environment; not necessarily in a predetermined arrangement in a (nominally) horizontal plane or in any other predetermined arrangement known at the time of program generation. Typically, metadata included in the program indicates rendering parameters for rendering at least one object of the program at an apparent spatial location or along a trajectory (in a three dimensional volume), e.g., using a three-dimensional array of speakers. For example, an object channel of the program may have corresponding metadata indicating a three-dimensional trajectory of apparent spatial positions at which the object (indicated by the object channel) is to be rendered. The trajectory may include a sequence of “floor” locations (in the plane of a subset of speakers which are assumed to be located on the floor, or in another horizontal plane, of the playback environment), and a sequence of “above-floor” locations (each determined by driving a subset of the speakers which are assumed to be located in at least one other horizontal plane of the playback environment).
Object based audio programs represent a significant improvement in many respects over traditional speaker channel-based audio programs, since speaker-channel based audio is more limited with respect to spatial playback of specific audio objects than is object channel based audio. Speaker channel-based audio programs consist of speaker channels only (not object channels), and each speaker channel typically determines a speaker feed for a specific, individual speaker in a listening environment.
Various methods and systems for generating and rendering object based audio programs have been proposed. During generation of an object based audio program, it is typically assumed that an arbitrary number of loudspeakers will be employed for playback of the program, and that the loudspeakers to be employed for playback will be located in arbitrary locations in the playback environment; not necessarily in a (nominally) horizontal plane or in any other predetermined arrangement known at the time of program generation. Typically, object-related metadata included in the program indicates rendering parameters for rendering at least one object of the program at an apparent spatial location or along a trajectory (in a three dimensional volume), e.g., using a three-dimensional array of speakers. For example, an object channel of the program may have corresponding metadata indicating a three-dimensional trajectory of apparent spatial positions at which the object (indicated by the object channel) is to be rendered. The trajectory may include a sequence of “floor” locations (in the plane of a subset of speakers which are assumed to be located on the floor, or in another horizontal plane, of the playback environment), and a sequence of “above-floor” locations (each determined by driving a subset of the speakers which are assumed to be located in at least one other horizontal plane of the playback environment). Examples of rendering of object based audio programs are described, for example, in PCT International Application No. PCT/US2001/028783, published under International Publication No. WO 2011/119401 A2 on Sep. 29, 2011, and assigned to the assignee of the present application.
An object-based audio program may include “bed” channels. A bed channel may be an object channel indicative of an object whose position does not change over the relevant time interval (and so is typically rendered using a set of playback system speakers having static speaker locations), or it may be a speaker channel (to be rendered by a specific speaker of a playback system). Bed channels do not have corresponding time varying position metadata (though they may be considered to have time-invariant position metadata). They may by indicative of audio elements that are dispersed in space, for instance, audio indicative of ambience.
Playback of an object-based audio program over a traditional speaker set-up (e.g., a 7.1 playback system) is achieved by rendering channels of the program (including object channels) to a set of speaker feeds. In typical embodiments of the invention, the process of rendering object channels (sometimes referred to herein as objects) and other channels of an object-based audio program (or channels of an audio program of another type) comprises in large part (or solely) a conversion of spatial metadata (for the channels to be rendered) at each time instant into a corresponding gain matrix (referred to herein as a “rendering matrix”) which represents how much each of the channels (e.g., object channels and speaker channels) contributes to a mix of audio content (at the instant) indicated by the speaker feed for a particular speaker (i.e., the relative weight of each of the channels of the program in the mix indicated by the speaker feed).
An “object channel” of an object-based audio program is indicative of a sequence of samples indicative of an audio object, and the program typically includes a sequence of spatial position metadata values indicative of object position or trajectory for each object channel. In typical embodiments of the invention, sequences of position metadata values corresponding to object channels of a program are used to determine an M×N matrix A(t) indicative of a time-varying gain specification for the program.
Rendering of “N” channels (e.g., object channels, or object channels and speaker channels) of an audio program to “M” speakers (speaker feeds) at time “/” of the program can be represented by multiplication of a vector x(t) of length “N”, comprised of an audio sample at time “t” from each channel, by an M×N matrix A(t) determined from associated position metadata (and optionally other metadata corresponding to the audio content to be rendered, e.g., object gains) at time “t”. The resultant values (e.g., gains or levels) of the speaker feeds at time t can be represented as a vector y(t), as in the following equation (1):
                                          [                                                                                                      y                      0                                        ⁡                                          (                      t                      )                                                                                                                                                              y                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        y                                              M                        -                        1                                                              ⁡                                          (                      t                      )                                                                                            ]                                y            ⁡                          (              t              )                                      =                                            [                                                                                                                  a                        00                                            ⁡                                              (                        t                        )                                                                                                                                                a                        01                                            ⁡                                              (                        t                        )                                                                                                                                                a                        02                                            ⁡                                              (                        t                        )                                                                                                  ⋱                                                                                                      a                                                  0                          ,                                                      N                            -                            1                                                                                              ⁡                                              (                        t                        )                                                                                                                                                                                a                        10                                            ⁡                                              (                        t                        )                                                                                                  ⋱                                                        ⋱                                                        ⋱                                                        ⋱                                                                                        ⋱                                                        ⋱                                                        ⋱                                                        ⋱                                                        ⋱                                                                                                                                      a                                                                              M                            -                            1                                                    ,                          0                                                                    ⁡                                              (                        t                        )                                                                                                  ⋱                                                        ⋱                                                        ⋱                                                                                                      a                                                                              M                            -                            1                                                    ,                                                      N                            -                            1                                                                                              ⁡                                              (                        t                        )                                                                                                        ]                                      A              ⁡                              (                t                )                                              ⁢                                                    [                                                                                                                              x                          0                                                ⁡                                                  (                          t                          )                                                                                                                                                                                                  x                          1                                                ⁡                                                  (                          t                          )                                                                                                                                                                                                  x                          2                                                ⁡                                                  (                          t                          )                                                                                                                                                ⋮                                                                                                                                                    x                                                      N                            -                            1                                                                          ⁡                                                  (                          t                          )                                                                                                                    ]                                            x                ⁡                                  (                  t                  )                                                      .                                              (        1        )            
Although equation (1) describes the rendering of N channels of an audio program (e.g., an object-based audio program, or an encoded version of an object-based audio program) into M output channels (e.g., M speaker feeds), it also represents a generic set of scenarios in which a set of N audio samples is converted to a set of M values (e.g., M samples) by linear operations. For example, A(t) could be a static matrix, “A”, whose coefficients do not vary with different values of time “t”. For another example, A(t) (which could be a static matrix, A) could represent a conventional downmix of a set of speaker channels x(t) to a smaller set of speaker channels y(t) (or x(t) could be a set of audio channels that describe a spatial scene in an Ambisonics format), and the conversion to speaker feeds y(t) could be prescribed as multiplication by the downmix matrix A. Even in an application employing a nominally static downmix matrix, the actual linear transformation (matrix multiplication) applied may be dynamic in order to ensure clip-protection of the downmix (i.e., a static transformation A may be converted to a time-varying transformation A(t), to ensure clip-protection).
An audio program rendering system (e.g., a decoder implementing such a system) may receive metadata which determine rendering matrices A(t) (or it may receive the matrices themselves) only intermittently and not at every instant “t” during a program. For example, this could be due to any of a variety of reasons, e.g., low time resolution of the system that actually outputs the metadata or the need to limit the bit rate of transmission of the program. The inventors have recognized that it may be desirable for a rendering system to interpolate between rendering matrices A(t1) and A(t2), at time instants “t1” and “t2” during a program, respectively, to obtain a rendering matrix A(t3) for an intermediate time instant “t3.” Interpolation ensures that the perceived position of objects in the rendered speaker feeds varies smoothly over time, and may eliminate undesirable artifacts such as zipper noise that stem from discontinuous (piece-wise constant) matrix updates. The interpolation may be linear (or nonlinear), and typically should ensure a continuous path in time from A(t1) to A(t2).
Dolby TrueHD is a conventional audio codec format that supports lossless and scalable transmission of audio signals. The source audio is encoded into a hierarchy of substreams of channels, and a selected subset of the substreams (rather than all of the substreams) may be retrieved from the bitstream and decoded, in order to obtain a lower dimensional (downmix) presentation of the spatial scene. When all the substreams are decoded, the resultant audio is identical to the source audio (the encoding, followed by the decoding, is lossless).
In a commercially available version of TrueHD, the source audio is typically a 7.1 channel mix which is encoded into a sequence of three substreams, including a first substream which can be decoded to determine a two channel downmix of the 7.1 channel original audio. The first two substreams may be decoded to determine a 5.1 channel downmix of the original audio. All three substreams may be decoded to determine the original 7.1 channel audio. Technical details of Dolby TrueHD, and the Meridian Lossless Packing (MLP) technology on which it is based, are well known. Aspects of TrueHD and MLP technology are described in U.S. Pat. No. 6,611,212, issued Aug. 26, 2003, and assigned to Dolby Laboratories Licensing Corp., and the paper by Gerzon, et al., entitled “The MLP Lossless Compression System for PCM Audio,” J. AES, Vol. 52, No. 3, pp. 243-260 (March 2004).
TrueHD supports specification of downmix matrices. In typical use, the content creator of a 7.1 channel audio program specifies a static matrix to downmix the 7.1 channel program to a 5.1 channel mix, and another static matrix to downmix the 5.1 channel downmix to a 2 channel downmix. Each static downmix matrix may be converted to a sequence of downmix matrices (each matrix in the sequence for downmixing a different interval in the program) in order to achieve clip-protection. However, each matrix in the sequence is transmitted (or metadata determining each matrix in the sequence is transmitted) to the decoder, and the decoder does not perform interpolation on any previously specified downmix matrix to determine a subsequent matrix in a sequence of downmix matrices for a program.
FIG. 1 is a schematic diagram of elements of a conventional TrueHD system, in which the encoder (30) and decoder (32) are configured to implement matrixing operations on audio samples. In the FIG. 1 system, encoder 30 is configured to encode an 8-channel audio program (e.g., a traditional set of 7.1 speaker feeds) as an encoded bitstream including two substreams, and decoder 32 is configured to decode the encoded bitstream to render either the original 8-channel program (losslessly) or a 2-channel downmix of the original 8-channel program. Encoder 30 is coupled and configured to generate the encoded bitstream and to assert the encoded bitstream to delivery system 31.
Delivery system 31 is coupled and configured to deliver (e.g., by storing and/or transmitting) the encoded bitstream to decoder 32. In some embodiments, system 31 implements delivery of (e.g., transmits) an encoded multichannel audio program over a broadcast system or a network (e.g., the internet) to decoder 32. In some embodiments, system 31 stores an encoded multichannel audio program in a storage medium (e.g., a disk or set of disks), and decoder 32 is configured to read the program from the storage medium.
The block labeled “InvChAssign1” in encoder 30 is configured to perform channel permutation (equivalent to multiplication by a permutation matrix) on the channels of the input program. The permutated channels then undergo encoding in stage 33, which outputs eight encoded signal channels. The encoded signal channels may (but need not) correspond to playback speaker channels. The encoded signal channels are sometimes referred to as “internal” channels since a decoder (and/or rendering system) typically decodes and renders the content of the encoded signal channels to recover the input audio, so that the encoded signal channels are “internal” to the encoding/decoding system. The encoding performed in stage 33 is equivalent to multiplication of each set of samples of the permutated channels by an encoding matrix (implemented as a cascade of n+1 matrix multiplications, identified as Pn−1, . . . , P1−1, P0−1, to be described below in greater detail).
Matrix determination subsystem 34 is configured to generate data indicative of the coefficients of two sets of output matrices (one set corresponding to each of two substreams of the encoded channels). One set of output matrices consists of two matrices, P02,P12, each of which is a primitive matrix (defined below) of dimension 2×2, and is for rendering a first substream (a downmix substream) comprising two of the encoded audio channels of the encoded bitstream (to render a two-channel downmix of the eight-channel input audio). The other set of output matrices consists of rendering matrices, P0,P1, . . . , Pn, each of which is a primitive matrix, and is for rendering a second substream comprising all eight of the encoded audio channels of the encoded bitstream (for lossless recovery of the eight-channel input audio program). A cascade of the matrices, P02,P12, along with the matrices P02−1, P10−1, . . . , Pn−1, applied to the audio at the encoder, is equal to the downmix matrix specification that transforms the 8 input audio channels to the 2-channel downmix, and a cascade of the matrices, P0,P1, . . . , Pn, renders the 8 encoded channels of the encoded bitstream back into the original 8 input channels.
The coefficients (of each of matrix) that are output from subsystem 34 to packing subsystem 35 are metadata indicating relative or absolute gain of each channel to be included in a corresponding mix of channels of the program. The coefficients of each rendering matrix (for an instant of time during the program) represent how much each of the channels of a mix should contribute to the mix of audio content (at the corresponding instant of the rendered mix) indicated by the speaker feed for a particular playback system speaker.
The eight encoded audio channels (output from encoding stage 33), the output matrix coefficients (generated by subsystem 34), and typically also additional data are asserted to packing subsystem 35, which assembles them into the encoded bitstream which is then asserted to delivery system 31.
The encoded bitstream includes data indicative of the eight encoded audio channels, the two sets of output matrices (one set corresponding to each of two substreams of the encoded channels), and typically also additional data (e.g., metadata regarding the audio content).
Parsing subsystem 36 of decoder 32 is configured to accept (read or receive) the encoded bitstream from delivery system 31 and to parse the encoded bitstream. Subsystem 36 is operable to assert the substreams of the encoded bitstream, including a “first” substream comprising only two of the encoded channels of the encoded bitstream, and output matrices (P02,P12) corresponding to the first substream, to matrix multiplication stage 38 (for processing which results in a 2-channel downmix presentation of content of the original 8-channel input program). Subsystem 36 is also operable to assert the substreams of the encoded bitstream (the “second” substream comprising all eight encoded channels of the encoded bitstream) and corresponding output matrices (P0,P1, . . . , Pn) to matrix multiplication stage 37 for processing which results in losslessly rendering the original 8-channel program.
More specifically, stage 38 multiplies two audio samples of the two channels of the first substream by a cascade of the matrices P02,P12 and each resulting set of two linearly transformed samples undergoes channel permutation (equivalent to multiplication by a permutation matrix) represented by the block titled “ChAssign0” to yield each pair of samples of the required 2 channel downmix of the 8 original audio channels. The cascade of matrixing operations performed in encoder 30 and decoder 32 is equivalent to application of a downmix matrix specification that transforms the 8 input audio channels to the 2-channel downmix.
Stage 37 multiplies each vector of eight audio samples (one from each of the full set of eight channels of the encoded bitstream) by a cascade of the matrices P0,P1, . . . , Pn, and each resulting set of eight linearly transformed samples undergoes channel permutation (equivalent to multiplication by a permutation matrix) represented by the block titled “ChAssign1” to yield each set of eight samples of the losslessly recovered original 8-channel program. In order that the output 8 channel audio is exactly the same as the input 8 channel audio (to achieve the “lossless” characteristic of the system), the matrixing operations performed in encoder 30 should be exactly (including quantization effects) the inverse of the matrixing operations performed in decoder 32 on the lossless (second) substream of the encoded bitstream (i.e., multiplication by the cascade of matrices P0,P1, . . . , Pn). Thus, in FIG. 1, the matrixing operations in stage 33 of encoder 30 are identified as a cascade of the inverse matrices of the matrices P0, P1, . . . , Pn, in the opposite sequence applied in stage 37 of decoder 32, namely: Pn−1, . . . , P1−1,P0−1.
Decoder 32 applies the inverse of the channel permutation applied by encoder 30 (i.e., the permutation matrix represented by element “ChAssign1” of decoder 32 is the inverse of that represented by element “InvChAssign1” of encoder 30).
Given a downmix matrix specification (e.g., specification of a static matrix A that is 2×8 in dimension), an objective of a conventional TrueHD encoder implementation of encoder 30 is to design output matrices (e.g., P0, P1, . . . , Pn and P02,P12 of FIG. 1), and input matrices (Pn−1, . . . , P1−1, P0−1) and output (and input) channel assignments so that:                1. the encoded bitstream is hierarchical (i.e., in the example, the first two encoded channels are sufficient to derive the 2 channel downmix presentation, and the full set of eight encoded channels is sufficient to recover the original 8 channel program); and        2. the matrices for the topmost stream (P0, P1, . . . , Pn in the example) are exactly invertible so that the input audio is exactly retrievable by the decoder.        
Typical computing systems work with finite precision and inverting an arbitrary invertible matrix exactly could require very large precision. TrueHD solves this problem by constraining the output matrices and input matrices (i.e., P0, P1, . . . , Pn and Pn−1, . . . , P1−1, P0−1) to be square matrices of the type known as “primitive matrices”.
A primitive matrix P of dimension N×N is of the form:
  P  =            [                                    1                                0                                ⋱                                ⋱                                0                                                0                                1                                0                                ⋱                                ⋱                                                              α              0                                                          α              1                                                          α              2                                            ⋱                                              α                              N                -                1                                                                          ⋮                                ⋱                                ⋱                                ⋱                                ⋱                                                0                                0                                0                                0                                1                              ]        .  
A primitive matrix is always a square matrix. A primitive matrix of dimension N×N is identical to the identity matrix of dimension N×N except for one (non-trivial) row (i.e., the row comprising elements α0, α1, α2, . . . αN-1 in the example). In all other rows, the off-diagonal elements are zeros and the element shared with the diagonal has an absolute value of 1 (i.e., either+1 or −1). To simplify language in this disclosure, the drawings and descriptions will always assume that a primitive matrix has diagonal elements that are equal to +1 with the possible exception of the diagonal element in the non-trivial row. However, we note that this is without loss of generality, and ideas presented in this disclosure pertain to the general class of primitive matrices where diagonal elements may be + 1 or −1.
When a primitive matrix, P, operates on (i.e., multiplies) a vector x(t), the result is the product Px(t), which is another N-dimensional vector that is exactly the same as x(t) in all elements except one. Thus each primitive matrix can be associated with a unique channel which it manipulates (or on which it operates).
We will use the term “unit primitive matrix” herein to denote a primitive matrix in which the element shared with the diagonal (by the non-trivial row of the primitive matrix) has an absolute value of 1 (i.e., either +1 or −1). Thus, the diagonal of a unit primitive matrix consists of all positive ones, +1, or all negative ones, −1, or some positive ones and some negative ones. A primitive matrix only alters one channel of a set (vector) of samples of audio program channels, and a unit primitive matrix is also losslessly invertible due to the unit values on the diagonal. Again, to simplify the discussion herein, we will use the term unit primitive matrix to refer to a primitive matrix whose non-trivial row has a diagonal element of +1. However, all references to unit primitive matrices herein, including in the claims, are intended to cover the more generic case where a unit primitive matrix can have a non-trivial row whose shared element with the diagonal is +1 or −1.
If α2=1 (resulting in a unit primitive matrix having a diagonal consisting of positive ones) in the above example of primitive matrix, P, it is seen that the inverse of P is exactly:
      P          -      1        =            [                                    1                                0                                ⋱                                ⋱                                0                                                0                                1                                0                                ⋱                                ⋱                                                              -                              α                0                                                                        -                              α                1                                                          1                                ⋱                                              -                              α                                  N                  -                  1                                                                                          ⋮                                ⋱                                ⋱                                ⋱                                ⋱                                                0                                0                                0                                0                                1                              ]        .  
It is true in general that the inverse of a unit primitive matrix is simply determined by inverting (multiplying by −1) each of its non-trivial α coefficients which does not lie along the diagonal.
If the matrices P0, P1, . . . , Pn employed in decoder 32 of FIG. 1 are unit primitive matrices (having unit diagonals), the sequence of matrixing operations Pn−1, . . . , P1−1, P0−1 in encoder 30 and P0, P1, . . . , Pn in decoder 32 can be implemented by finite precision circuits of the type shown in FIGS. 2A and 2B. FIG. 2A is conventional circuitry of an encoder for performing lossless matrixing via primitive matrices implemented with finite precision arithmetic. FIG. 2B is conventional circuitry of a decoder for performing lossless matrixing via primitive matrices implemented with finite precision arithmetic. Details of typical implementations of the FIG. 2A and FIG. 2B circuitry (and variations thereon) are described in above-cited U.S. Pat. No. 6,611,212, issued Aug. 26, 2003.
In FIG. 2A (representing circuitry for encoding a four channel audio program comprising channels S1, S2, S3, and S4), a first primitive matrix P0−1 (having one row of four non-zero α coefficients) operates on each sample of channel S1 (to generate encoded channel S1′) by mixing the relevant sample of channel S1 with corresponding samples (occurring at the same time, t) of channels S2, S3, and S4. A second primitive matrix P1−1 (also having one row of four non-zero α coefficients) operates on each sample of channel S2 (to generate a corresponding sample of encoded channel S2′) by mixing the relevant sample of channel S2 with corresponding samples of channels S1′, S3, and S4. More specifically, the sample of channel S2 is multiplied by the inverse of a coefficient α1 (identified as “coeff[1,2]”) of matrix P0−1, the sample of channel S3 is multiplied by the inverse of a coefficient α2 (identified as “coeff[1,3]”) of matrix P0−1, and the sample of channel S4 is multiplied by the inverse of a coefficient α3 (identified as “coeff[1,4]”) of matrix P0−1, the products are summed and then quantized, and the quantized sum is then subtracted from the corresponding sample of channel S1. Similarly, the sample of channel S1 is multiplied by the inverse of a coefficient α0 (identified as “coeff[2,1]”) of matrix P1−1, the sample of channel S3 is multiplied by the inverse of a coefficient α2 (identified as “coeff[2,3]”) of matrix P1−1, and the sample of channel S4 is multiplied by the inverse of a coefficient α3 (identified as “coeff[2,4]”) of matrix P1−1, the products are summed and then quantized, and the quantized sum is then subtracted from the corresponding sample of channel S2. Quantization stage Q1 of matrix P0−1 quantizes the output of the summation element which sums the products of the multiplications (by non-zero α coefficients of the matrix P0−1, which are typically fractional values) to generate the quantized value which is subtracted from the sample of channel S1 to generate the corresponding sample of encoded channel S1′. Quantization stage Q2 of matrix P1−1 quantizes the output of the summation element which sums the products of the multiplications (by non-zero α coefficients of the matrix P1−1, which are typically fractional values) to generate the quantized value which is subtracted from the sample of channel S2 to generate the corresponding sample of encoded channel S2′. In a typical implementation (e.g., for performing TrueHD encoding), each sample of each of channels S1, S2, S3, and S4 comprises 24 bits (as indicated in FIG. 2A), and the output of each multiplication element comprises 38 bits (as also indicated in FIG. 2A), and each of quantization stages Q1 and Q2 outputs a 24 bit quantized value in response to each 38-bit value which is input thereto.
Of course, to encode channels S3 and S4, two additional primitive matrices could be cascaded with the two primitive matrices (P0−1 and P1−1) indicated in FIG. 2A.
In FIG. 2B (representing circuitry for decoding of the four-channel encoded program generated by the encoder of FIG. 2A), a primitive matrix P1 (having one row of four non-zero α coefficients, and which is the inverse of the matrix P1−1) operates on each sample of encoded channel S2′ (to generate a corresponding sample of decoded channel S2) by mixing samples of channels S1′, S3, and S4 with the relevant sample of channel S2′. A second primitive matrix P0 (also having one row of four non-zero α coefficients, and which is the inverse of the matrix P0−1)) operates on each sample of encoded channel S1′ (to generate a corresponding sample of decoded channel S1) by mixing samples of channels S2, S3, and S4 with the relevant sample of channel S1′. More specifically, the sample of channel S1′ is multiplied by a coefficient α0 (identified as “coeff[2,1]”) of matrix P1, the sample of channel S3 is multiplied by a coefficient α2 (identified as “coeff[2,3]”) of matrix P1, and the sample of channel S4 is multiplied by a coefficient α3 (identified as “coeff[2,4]”) of matrix P1, the products are summed and then quantized, and the quantized sum is then added to the corresponding sample of channel S1′. Similarly, the sample of channel S2′ is multiplied by a coefficient α1 (identified as “coeff[1,2]”) of matrix P0, the sample of channel S3 is multiplied by a coefficient α2 (identified as “coeff[1,3]”) of matrix P0, and the sample of channel S4 is multiplied by a coefficient α3 (identified as “coeff[1,4]”) of matrix P0, the products are summed and then quantized, and the quantized sum is then added to the corresponding sample of channel S1′. Quantization stage Q2 of matrix P1 quantizes the output of the summation element which sums the products of the multiplications (by non-zero a coefficients of the matrix P1, which are typically fractional values) to generate the quantized value which is added to the sample of channel S2′ to generate the corresponding sample of decoded channel S2. Quantization stage Q1 of matrix P0 quantizes the output of the summation element which sums the products of the multiplications (by non-zero a coefficients of the matrix P0, which are typically fractional values) to generate the quantized value which is added to the sample of channel S1′ to generate the corresponding sample of decoded channel S1. In a typical implementation (e.g., for performing TrueHD decoding), each sample of each of channels S1′, S2′, S3, and S4 comprises 24 bits (as indicated in FIG. 2B), and the output of each multiplication element comprises 38 bits (as also indicated in FIG. 2B), and each of quantization stages Q1 and Q2 outputs a 24 bit quantized value in response to each 38-bit value which is input thereto.
Of course, to decode channels S3 and S4, two additional primitive matrices could be cascaded with the two primitive matrices (P0 and P1) indicated in FIG. 2B.
A sequence of primitive matrices, e.g., the sequence of primitive N×N matrices P0, P1, . . . , Pn implemented by the decoder of FIG. 1, operating on a vector (N samples, each of which is a sample of a different channel of a first set of N channels) can implement any linear transformation of the N samples into a new set of N samples (e.g., it can implement the linear transformation performed at a time t by multiplying samples of N channels of an object-based audio program by any N×N implementation of matrix A(t) of equation (1) during rendering of the channels into N speaker feeds, where the transformation is achieved by manipulating one channel at a time). Thus, multiplication of a set of N audio samples by a sequence of N×N primitive matrices represents a generic set of scenarios in which the set of N samples is converted to another set (of N samples) by linear operations.
With reference again to a TrueHD implementation of decoder 32 of FIG. 1, in order to maintain uniformity of decoder architecture in TrueHD, the output matrices of the downmix substream (P02, P12 in FIG. 1) are also implemented as primitive matrices although they need not be invertible (or have a unit diagonal) since they are not associated with achieving losslessness.
The input and output primitive matrices employed in a TrueHD encoder and decoder depend on each particular downmix specification to be implemented. The function of a TrueHD decoder is to apply the appropriate cascade of primitive matrices to the received encoded audio bitstream. Thus, the TrueHD decoder of FIG. 1 decodes the 8 channels of the encoded bitstream (delivered by system D), and generates a 2-channel downmix by applying a cascade of two output primitive matrices P02, P12 to a subset of the channels of the decoded bitstream. A TrueHD implementation of decoder 32 of FIG. 1 is also operable to decode the 8 channels of the encoded bitstream (delivered by system D) to recover losslessly the original 8-channel program by applying a cascade of eight output primitive matrices P0, P1, . . . , Pn to the channels of the encoded bitstream.
A TrueHD decoder does not have the original audio (which was input to the encoder) to check against to determine whether its reproduction is lossless (or as otherwise desired by the encoder in the case of a downmix). However, the encoded bitstream contains a “check word” (or lossless check) which is compared against a similar word derived at the decoder from the reproduced audio to determine whether the reproduction is faithful.
If an object-based audio program (e.g., comprising more than eight channels) were encoded by a conventional TrueHD encoder, the encoder might generate downmix substreams which carry presentations compatible with legacy playback devices (e.g., presentations which could be decoded to downmixed speaker feeds for playback on a traditional 7.1 channel or 5.1 channel or other traditional speaker set up) and a top substream (indicative of all channels of the input program). A TrueHD decoder might recover the original object-based audio program losslessly for rendering by a playback system. Each rendering matrix specification employed by the encoder in this case (i.e., for generating the top substream and each downmix substream), and thus each output matrix determined by the encoder, might be a time-varying rendering matrix, A(t), which linearly transforms samples of channels of the program (e.g., to generate a 7.1 channel or 5.1 channel downmix). However, such a matrix A(t) would typically vary rapidly in time as objects move around in the spatial scene, and bit-rate and processing limitations of a conventional TrueHD system (or other conventional decoding system) would typically constrain the system to be able at most accommodate a piece-wise constant approximation to such a continuously (and rapidly) varying matrix specification (with a higher matrix update rate achieved at the cost of increased bit-rate for transmission of the encoded program). In order to support rendering of object-based multichannel audio programs (and other multichannel audio programs) with speaker feeds indicative of a rapidly varying mix of content from channels of the programs, the inventors have recognized that it is desirable to enhance conventional systems to accommodate interpolated matrixing, where rendering matrix updates are infrequent and a desired trajectory (i.e., a desired sequence of mixes of content of channels of the program) between updates is specified parametrically.