Recently, a technique referred to as the Spatial Audio Codec has gradually been standardized in the MPEG audio standard. This aims for compression and coding of a multi-channel signal which has very little amount of information and which provides a lively scene. For example, the AAC (Advanced Audio Coding) scheme, which has already been widely used as an audio scheme for digital TVs, requires bit rates of 512 kbps and 384 kbps per 5.1 ch. On the other hand, the Spatial Audio Codec aims for compression and coding of a multi-channel audio signal at very low bit rates, such as 128 kbps, 64 kbps, and further, 48 kbps (See Non-patent Reference 1, for example).
FIG. 1 is a block diagram showing an overall structure of an audio apparatus utilizing a basic principle of the Spatial Audio Codec.
An audio apparatus 1 includes an audio encoder 10 which performs spatial-audio-coding on a set of audio signals to output the coded signals, and an audio decoder 20 which decodes the coded signals.
The audio encoder 10 is intended for processing a multi-channel audio signal (for example, an audio signal with two channels of L and R) on a frame-by-frame basis shown in 1024 samples and 2048 samples, and includes a downmixing unit 11, a binaural cue extracting unit 12, an encoder 13, and a multiplexing unit 14.
The downmixing unit 11 generates a downmix signal M into which the audio signal L and R is downmixed by, for example, calculating an average of the spectrally represented audio signal with two channels of left L and right R, in other words, by applying M=(L+R)/2.
The binaural cue extracting unit 12 generates BC information (binaural cue) for recovering the original audio signals L and R from the downmix signal M, by comparing the audio signals L and R and the downmix signal M on a spectral band-by-spectral band basis.
The BC information includes level information IID which indicates inter-channel level/intensity difference, correlation information ICC which indicates inter-channel coherence/correlation, and phase information IPD which indicates inter-channel phase/delay difference.
Here, the correlation information ICC indicates similarity of the audio signals L and R. Meanwhile, the level information IID indicates relative intensity of the audio signals L and R. In general, the level information IID is information for controlling balance and localization of a sound, and the correlation information ICC is information for controlling width and diffusiveness of the sound image. Both of these are spatial parameters for helping a listener mentally compose an auditory scene.
In a latest special codec, the spectrally represented audio signals L and R and the downmix signal M are usually divided into plural groups of “parameter bands.” Thus, the BC information is computed on each parameter band-by-parameter band basis. Note that the terms “BC information (binaural cue)” and “spatial parameter” are often used synonymously and interchangeably.
The encoder 13 performs compression coding on the downmix signal M, using, for example, the MPEG Audio Layer-3 (MP3) and the Advanced Audio Coding (AAC). In other words, the encoder 13 encodes the downmix signal M to generate a compressed coded stream.
In addition to performing quantization on the BC information, the multiplexing unit 14 generates a bit stream by multiplexing the compressed downmix signal M and the quantized BC information, and outputs the bit stream as the coded signal.
The audio decoder 20 includes a demultiplexing unit 21, a decoder 22, and a multi-channel synthesizing unit 23.
The demultiplexing unit 21: obtains the bit stream; separates the bit stream into the quantized BC information and the encoded downmix signal M; and outputs the BC information and downmix signal M. Note that the demultiplexing unit 21 performs inverse quantization on the quantized BC information and output the inversely-quantized BC information.
The decoder 22 decodes the coded downmix signal M, and outputs the downmix signal M to the multi-channel synthesizing unit 23.
The multi-channel synthesizing unit 23 obtains the downmix signal M which is outputted from the decoder 22 and the BC information which is outputted from the demultiplexing unit 21. Then, the multi-channel synthesizing unit 23 recovers the audio signals L and R from the downmix signal M using the BC information. These processes for recovering the original two signals from the downmix signal involve a later-described “channel separation technique.”
Note that the above example only describes how two signals can be represented as one downmix signal and a set of spatial parameters in an encoder, and how a downmix signal can be separated into two signals in a decoder by processing the downmix signal and the spatial parameters. With the technology, 2 or more channels of audio (for example, 6 channels from a 5.1 audio source) can be compressed into 1 or 2 downmix channels in a coding process and recovered in a decoding process.
In other words, the audio apparatus 1 is described in the above, exemplifying the fact that that the 2-channel audio signal is coded and decoded; meanwhile, the audio apparatus 1 can also code and decode a signal with 2 or more channels (for example, a 6-channel audio signal which composes a 5.1-channel audio source).
FIG. 2 is a block diagram showing a functional structure of the multi-channel synthesizing unit 23 in the case of the 6 channels.
In the case where the downmix signal M is separated into the 6-channel audio signals, for example, the multi-channel synthesizing unit 23 includes a first channel separating unit 241, a second channel separating unit 242, a third channel separating unit 243, a fourth channel separating unit 244, and a fifth channel separating unit 245. Note that a center audio signal C with respect to a speaker placed in front of a listener, a left-front audio signal Lf with respect to a speaker placed ahead of the listener on the left, a right-front audio signal Rf with respect to a speaker placed ahead of the listener on the right, a left-back audio signal Ls with respect to a speaker placed behind the listener on the left, a right-back audio signal Rs with respect to a speaker placed behind the listener on the right, and a low-frequency audio signal LFE with respect to a subwoofer speaker for bass output are downmixed to form the downmix signal M.
The first channel separating unit 241 separates the downmix signal M into an intermediate first downmix signal M1 and an intermediate fourth downmix signal M4 and outputs the first downmix signal M1 and the intermediate fourth down mix signal M4. The center audio signal C, the left-front audio signal Lf, the right-front audio signal Rf, and the low-frequency audio signal LFE are downmixed to form the first downmix signal M1. The left-back audiosignal Ls and the right-back audio signal Rs are downmixed to form the fourth downmix signal M4.
The second channel separating unit 242 separates the first downmix signal M1 into an intermediate second downmix signal M2 and an intermediate third downmix signal M3 and outputs the intermediate second downmix signal M2 and the intermediate third downmix signal M3. The left-front audio signal Lf and the right-front audio signal Rf are downmixed to form the second downmix signal M2. The center audio signal C and the low-frequency audio signal LFE are downmixed to form the third downmix signal M3.
The third cannel separating unit 243 separates the second downmix signal M2 into the left-front audio signal Lf and the right-front audio signal Rf and outputs the left-front audio signal Lf and the right-front audio signal Rf.
The fourth channel separating unit 244 separates the third downmix signal M3 into the center audio signal C and the low-frequency audio signal LFE and outputs the center audio signal C and the low-frequency audio signal LFE.
The fifth channel separating unit 245 separates the fourth downmix signal M4 into the left-back audio signal Ls and the right-back audio signal Rs and outputs the left-back audio signal Ls and the right-back audio signal R.
As described above, the multi-channel synthesizing unit 23 performs identical separation processing, in each channel separation unit, in which a single downmix signal is separated into two downmix signals using a multistage manner, then recursively repeats the separation of signals one-by-one until the signals are separated into signals each having a single channel.
FIG. 3 is another functional block diagram showing a functional structure for describing a principle of the multi-channel synthesizing unit 23.
The multi-channel synthesizing unit 23 includes an all-pass filter 261, a BCC processing unit 262, and a calculating unit 263.
The all-pass filter 261 obtains the downmix signal M, and generates and outputs a decorrelated signal Mrev which has no correlation to the downmix signal M. The downmix signal M and the decorrelated signal Mrev are considered to be “mutually incoherent” when auditorily compared with each other. The decorrelated signal Merv also has the same energy as the downmix signal M has, and thus includes reverberating components of a finite duration which create an illusion as if a sound was surrounded.
The BCC processing unit 262 obtains the BC information, and generates to output a mixing factor Hij for maintaining a degree of correlation between L and R and orientation of L and R based on the level information IID and the correlation information ICC included in the BC information.
The calculating unit 263: obtains the downmix signal M, the decorrelated signal Mrev, and the mixing factor Hij; performs calculation shown in an Expression (1) below, using these; and outputs the audio signals L and R. As described above, by using the mixing factor Hji, the degree of correlation between the audio signals L and R and the directionality of the signals can be set to an intended condition.
[Expression 1]L=H11*M+H12*Mrev R=H21*M+H22*Mrev  (1)
FIG. 4 is a block diagram showing a detailed structure of the multi-channel synthesizing unit 23. Note that the decoder 22 is illustrated, as well.
The decoder 22 decodes a coded downmix signal into the downmix signal M in a time domain, and outputs the decoded downmix signal M to the multi-channel synthesizing unit 23. The multi-channel synthesizing unit 23 includes an analysis filter bank 231, a channel expanding unit 232, and a temporal processing apparatus (energy shaping apparatus) 900. The channel expanding unit 232 includes a pre-matrix processing unit 2321, a post-matrix processing unit 2322, a first calculating unit 2323, a decorrelation processing unit 2324, and a second calculating unit 2325.
The analysis filter bank 231 obtains the downmix signal M which is outputted from the decoder 22, transforms an representation form of the downmix signal M into a time-frequency hybrid representation, and outputs as first frequency band signals x represented in a summarized vector x. Note that the analysis filter bank 231 includes a first stage and a second stage. For example, the first stage is a QMF filter bank and the second stage is a Nyquist filter bank. At these stages, the spectral resolution of a low frequency sub-band is enhanced by, first, dividing a frequency band into plural frequency bands, using the QMF filter (first stage), and further, dividing the sub-band on the low frequency side into finer sub-bands, using the Nyquist filter (second stage).
The pre-matrix processing unit 2321 in the channel expanding unit 232 generates a matrix R1; namely, a scaling factor showing allocation (scaling) of a signal intensity level to each channel, using the BC information.
For example, the pre-matrix processing unit 2321 generates the matrix R1, using the level information IID which shows ratios between a signal intensity level of the downmix signal M and each of the signal intensity levels of the first downmix signal M1, the second downmix signal M2, the third downmix signal M3, and the fourth downmix signal M4.
In other words, the pre-matrix processing unit 2321 computes a scaling factor which is a vector R1 including vector elements R1 [0] through R1 [4] of the ILD spatial parameter out of the synthetic signals M1 through M4, using an ILD spatial parameter for scaling an energy level of the input downmix signal M in order to generate intermediate signals which the first through the fifth channel separating units 241 to 245 shown in FIG. 2 can use to generate the decorrelated signals.
The first calculating unit 2323 obtains the first frequency band signal x, in the time-frequency hybrid expression, which are outputted from the analysis filter bank 231, and, as shown in an Expression (2) and an Expression (3) described below, computes a product of the first frequency band signal x and the matrix R1. Then, the first calculating unit 2323 outputs an intermediate signal v which shows the result of the matrix calculation.
                    [                  Expression          ⁢                                          ⁢          2                ]                                                            v        =                              [                                                            M                                                                                                  M                    1                                                                                                                    M                    2                                                                                                                    M                    3                                                                                                                    M                    4                                                                        ]                    =                                    R              1                        ⁢            x                                              (        2        )            
Here, M1 through M4 are shown in the following expressions (3).
[Expression 3]M1=Lf+Rf+C+LFE M2=Lf+Rf M3=C+LFE M4=Ls+Rs  (3)
The decorrelation processing unit 2324 has a function as the all-pass filter 261 shown in FIG. 3, generates and outputs decorrelated signal w by applying all-pass filter processing to the intermediate signal v, as shown in an Expression (4) below. Note that structural elements of the decorrelated signals w, Mrev, Mi, and rev are signals that decorrelation processing is performed on the downmix signals M and Mi.
                    [                  Expression          ⁢                                          ⁢          4                ]                                                            w        =                              [                                                            M                                                                                                  decorr                    ⁡                                          (                      v                      )                                                                                            ]                    =                                    [                                                                                                                ⁢                      M                                                                                                                                                        ⁢                                              M                        rev                                                                                                                                                                              ⁢                                              M                                                  1                          ,                          rev                                                                                                                                                                                                      ⁢                                              M                                                  2                          ,                          rev                                                                                                                                                                                                      ⁢                                              M                                                  3                          ,                          rev                                                                                                                                                                                                      ⁢                                              M                                                  4                          ,                          rev                                                                                                                                ]                        =                                                            [                                                                                    M                                                                                                            0                                                                                                            0                                                                                                            0                                                                                                            0                                                                                                            0                                                                              ]                                +                                  [                                                                                                                                        ⁢                          0                                                                                                                                                                                        ⁢                                                      M                            rev                                                                                                                                                                                                                  ⁢                                                      M                                                          1                              ,                              rev                                                                                                                                                                                                                                              ⁢                                                      M                                                          2                              ,                              rev                                                                                                                                                                                                                                              ⁢                                                      M                                                          3                              ,                              rev                                                                                                                                                                                                                                              ⁢                                                      M                                                          4                              ,                              rev                                                                                                                                                            ]                                            =                                                w                  Dry                                +                                  w                  Wet                                                                                        (        4        )            
Note that wDry of the above Expression (4) is formed with an original downmix signal (referred to also as “dry” signal, hereinafter), and w-Wet is formed with a group of decorrelated signals (referred to also as “wet” signal, hereinafter).
The post-matrix processing unit 2322 generates a matrix R2, which shows distribution of reverberation to each channel, using the BC information. In other words, the post-matrix processing unit 2322 computes a mixing factor which is the matrix R2 for mixing M, Mi, and rev, in order to derive each signal. For example, the post-matrix 2322 drives the mixing factor Hij from the correlation information ICC which shows the width and diffusiveness of the sound image, and generates the matrix R2 which is formed from the mixing factor Hij.
The second calculating unit 2325 computes a product of the decorrelated signals w and the matrix R2, and outputs output signals y which shows the result of the matrix calculation. In other words, the second calculation unit 2325 separates the decorrelated signals w into six audio signals Lf, Rf, Ls, Rs, C, and LFE.
For example, as shown in FIG. 2, the left-front audio signal Lf is separated from the second downmix signal M2, thus for the separation of the left-front audio signal Lf, the second downmix signal M2 and the corresponding structural element of the decorrelated signals w, M2, rev, are used. Likewise, the second downmix signal M2 is separated from the first downmix signal M1, thus for computation of the second downmix signal M2, the first downmix signal M1 and the corresponding structure element of the decorrelated signals w, M1, rev, are used.
Thus, the left-front audio signal Lf is described in the expressions (5) below.
[Expression 5]Lf=H11,A*M2+H12,A*M2,rev M2=H11,D*M1+H12,D*M1,rev M1=H11,E*M+H12,E*Mrev  (5)
Here, Hij, A in the expressions (5) are mixing factors at the third channel separating unit 243, Hij, D are mixing factors at the first channel separation unit 241. The three expressions described in the expressions (5) can be compiled into one multiplication expression described in the following Expression (6).
                    [                  Expression          ⁢                                          ⁢          6                ]                                                                                                                L                f                            =                            ⁢                                                ⌊                                                                                                                                                                                                                                          H                                                                      11                                    ,                                    A                                                                                                  ⁢                                                                  H                                                                      11                                    ,                                    D                                                                                                  ⁢                                                                  H                                                                      11                                    ,                                    E                                                                                                                                                                                                                                                        H                                                                      11                                    ,                                    A                                                                                                  ⁢                                                                  H                                                                      11                                    ,                                    D                                                                                                  ⁢                                                                  H                                                                      12                                    ,                                    E                                                                                                                                                                                                                                                                                                                                                                                                                                                H                                                                      11                                    ,                                    A                                                                                                  ⁢                                                                  H                                                                      12                                    ,                                    D                                                                                                                                                                                                                      H                                                                  12                                  ,                                  A                                                                                                                                                    0                                                                                      0                                                                                                                                                            ⌋                                ⁢                w                                                                                        =                            ⁢                                                R                                      2                    ,                    Lf                                                  ⁢                w                                                                        (        6        )            
Other audio signals than the left-front audio signal Lf; namely, Rf, C, LFE, Ls, and Rs, are computed by a calculation of the above mentioned matrix and the matrix of the decorrelated signal w.
In other words, the output signal y are described in an Expression (7) described below.
                    [                  Expression          ⁢                                          ⁢          7                ]                                                                                                y              =                            ⁢                                                [                                                                                    Lf                                                                                                            Rf                                                                                                            Ls                                                                                                            Rs                                                                                                            C                                                                                                            LFE                                                                              ]                                =                                                      [                                                                                                                        R                                                          2                              ,                              Lf                                                                                                                                                                                                        R                                                          2                              ,                              Rf                                                                                                                                                                                                        R                                                          2                              ,                              Ls                                                                                                                                                                                                        R                                                          2                              ,                              Rs                                                                                                                                                                                                        R                                                          2                              ,                              C                                                                                                                                                                                                        R                                                          2                              ,                              LFE                                                                                                                                            ]                                    ⁢                  w                                                                                                        =                            ⁢                                                                    R                    2                                    ⁢                  w                                =                                                                                                    R                        2                                            ⁢                                              w                        Dry                                                              +                                                                  R                        2                                            ⁢                                              w                        Wet                                                                              =                                                            y                      Dry                                        +                                          y                      Wet                                                                                                                              (        7        )            
R2, the matrix, is an assembly of multiples of the mixing factors from the first to fifth channel separating units 241 to 245, looks like linear-combination of M, Mrev, M2, rev, . . . M4, rev since multi-channel signals are generated. For the following energy shaping processing, the y-Dry and the y-Wet are stored separately.
The temporal processing apparatus 900 transforms the restored expression form of each audio signal from the time-frequency hybrid expression to a time expression, and outputs plural audio signals in the time expression as a multi-channel signal. Note that the temporal processing apparatus 900 includes, for example, two stages, so as to match with the analysis filter bank 231. Furthermore, the matrixes R1 and R2 are generated as matrixes R1(b) and R2(b) for each parameter band b described above.
Here, before a wet signal and a dry signal are merged, the wet signal is shaped according to a temporal envelope of the dry signal. This module, the temporal processing apparatus 900, is essential for signals having a high-speed time-varying characteristic, such as an attack sound.
In other words, in order to prevent sound from blunting in the case of a signal such as an attack sound and an audio signal which drastically changes in time, the temporal processing apparatus 900 maintains the original sound quality by adding, a signal in which the time envelop of diffuse signals are shaped and direct signals so as to match the time envelop of the direct signals, and outputting the added signal.
FIG. 5 is a block diagram showing a detailed structure of the temporal processing apparatus 900 shown in FIG. 4.
As shown in FIG. 5, the temporal processing apparatus 900 includes a splitter 901, synthesis filter banks 902 and 903, a downmix unit 904, bandpath filters (BPF) 905 and 906, normalization processing units 907 and 908, a scale computation processing unit 909, a smoothing processing unit 910, a calculating unit 911, high-pass filters 912 and 913, and an adding unit 913.
The splitter 901 splits a recovered signal y into direct signals y-direct and diffuse signals y-diffuse as shown in the following Expression (8) and Expression (9).
                    [                  Expression          ⁢                                          ⁢          8                ]                                                                                                                y                direct                            =                            ⁢                              [                                                                                                    y                                                  1                          ,                          direct                                                                                                                                                                        y                                                  2                          ,                          direct                                                                                                                                                                        y                                                  3                          ,                          direct                                                                                                                                                                        y                                                  4                          ,                          direct                                                                                                                                                                        y                                                  5                          ,                          direct                                                                                                                                                                        y                                                  6                          ,                          direct                                                                                                                    ]                                                                                        =                            ⁢                              {                                                                                                                              y                          Dry                                                +                                                  y                          Wet                                                                                                                                    For                        ⁢                                                                                                  ⁢                        low                        ⁢                                                                                                  ⁢                        frequency                        ⁢                                                                                                  ⁢                        region                                                                                                                                                y                        Dry                                                                                                            For                        ⁢                                                                                                  ⁢                        high                        ⁢                                                                                                  ⁢                        frequency                        ⁢                                                                                                  ⁢                        region                                                                                                                                                    (        8        )                                [                  Expression          ⁢                                          ⁢          9                ]                                                                                                                y                diffuse                            =                            ⁢                              [                                                                                                    y                                                  1                          ,                          diffuse                                                                                                                                                                        y                                                  2                          ,                          diffuse                                                                                                                                                                        y                                                  3                          ,                          diffuse                                                                                                                                                                        y                                                  4                          ,                          diffuse                                                                                                                                                                        y                                                  5                          ,                          diffuse                                                                                                                                                                        y                                                  6                          ,                          diffuse                                                                                                                    ]                                                                                        =                            ⁢                              {                                                                            0                                                                                      For                        ⁢                                                                                                  ⁢                        low                        ⁢                                                                                                  ⁢                        frequency                        ⁢                                                                                                  ⁢                        region                                                                                                                                                y                        Wet                                                                                                            For                        ⁢                                                                                                  ⁢                        high                        ⁢                                                                                                  ⁢                        frequency                        ⁢                                                                                                  ⁢                        region                                                                                                                                                    (        9        )            
The synthesis filter bank 902 transforms the six direct signals into the time domain. The synthesis filter bank 903 transforms the six diffuse signals into the time domain, as well as the synthesis filter bank 902.
The downmix unit 904 adds up the six direct signals in the time domain to form one direct downmix signal M-direct, based on an Expression (10) below.
                    [                  Expression          ⁢                                          ⁢          10                ]                                                                      M          direct                =                              ∑                          i              =              1                        6                    ⁢                      y                          i              ,              direct                                                          (        10        )            
The BPF 905 performs bandpass processing on one direct downmix signal. As well as the BPF 905, the BPF 906 performs bandpass processing on all of the six diffuse signals. The bandpassed direct downmix signal and the diffuse signals are shown in an Expression (11) below.
[Expression 11]Mdirect,BP=Bandpass(Mdirect)yi,diffuse,BP=Bandpass(yi,diffuse)  (11)
The normalization processing unit 907 normalizes the direct downmix signal so that the direct downmix signal has one piece of energy for one processing frame, based on an Expression (12) shown below.
                    [                  Expression          ⁢                                          ⁢          12                ]                                                                                  M                          direct              ,              norm                                ⁡                      (            t            )                          =                                            M                              direct                ,                BP                                      ⁡                          (              t              )                                                                          ∑                t                            ⁢                                                                    M                                          direct                      ,                      BP                                                        ⁡                                      (                    t                    )                                                  ·                                                      M                                          direct                      ,                      BP                                                        ⁡                                      (                    t                    )                                                                                                          (        12        )            
As well as the normalization processing unit 907, the normalization processing unit 908 normalizes the six diffuse signals, based on an Expression (13) shown below.
[Expression 13]. . .   (13)
The normalized signals are divided into time blocks in the scale computation processing unit 909. Then, the scale computation processing unit 909 computes a scale factor for each time block, based on an Expression (14) shown below.
                    [                  Expression          ⁢                                          ⁢          14                ]                                                                                  scale            i                    ⁡                      (            b            )                          =                                                            ∑                                  t                  ⋐                  b                                            ⁢                                                                    M                                          direct                      ,                      norm                                                        ⁡                                      (                    t                    )                                                  ·                                                      M                                          direct                      ,                      norm                                                        ⁡                                      (                    t                    )                                                                                                      ∑                                  t                  ⋐                  b                                            ⁢                                                                    y                                          i                      ,                      diffuse                      ,                      norm                                                        ⁡                                      (                    t                    )                                                  ·                                                      y                                          1                      ,                      diffuse                      ,                      norm                                                        ⁡                                      (                    t                    )                                                                                                          (        14        )            
Note that FIG. 6 is a drawing showing the above dividing processing in the case where a time block b in the above Expression (14) shows a “block index.”
Finally, the diffuse signals are scaled in the calculating unit 911, and, in the HPF 912, highpass-filtered based on an Expression (15) below before combined with the direct signals in the is adding unit 913 as shown below.
[Expression 15]yi,diffuse,scaled,HP=Highpass(yi,diffuse·scalei)yi=yi,direct+yi,diffuse,scaled,HP  (15)
Note that the smoothing processing unit 910 is an optional technique for improving smoothness of the scale factor which covers continuous time blocks. For example, the continuous time blocks may be overlapped with each other as shown in a in FIG. 6, and the “weighted” scale factor in the overlapped area is calculated, using a window function.
Also in a scaling processing 911, a person skilled in the art can use such a conventionally known overlapping and adding technique.
As mentioned above, the conventional temporal processing apparatus 900 presents the above energy shaping method by shaping each decorrelated signal in the time domain for each of the original signals.
Non-patent Reference 1:J. Herre, et al, “The Reference Model Architecture for MPEG Spatial Audio Coding”, 118th AES Convention, Barcelona.