Conventionally, multi-channel acoustic signal processing devices have been provided which down-mix a plurality of audio signals into a down-mixed signal and divide the down-mixed signal into the original plurality of signals.
FIG. 1 is a block diagram showing a structure of such a multi-channel acoustic signal processing device.
The multi-channel acoustic signal processing device 1000 has: a multi-channel acoustic coding unit 1100 which performs spatial acoustic coding on a group of audio signals and outputs the resulting acoustic coded signals; and a multi-channel acoustic decoding unit 1200 which decodes the acoustic coded signals.
The multi-channel acoustic coding unit 1100 processes audio signals (audio signals L and R of two channels, for example) in units of frames which are indicated by 1024-samples, 2048-samples, or the like. The multi-channel acoustic coding unit 1100 includes a down-mix unit 1110, a binaural cue calculation unit 1120, an audio encoder unit 1150, and a multiplexing unit 1190.
The down-mix unit 1110 generates a down-mixed signal M in which audio signals L and R of two channels that are expressed as spectrums are down-mixed, by calculating an average of the audio signals L and R, in other words, by calculating M=(L+R)/2.
The binaural cue calculation unit 1120 generates binaural cue information by comparing the down-mixed signal M and the audio signals L and R for each spectrum band. The binaural cue information is used to reproduce the audio signals L and R from the down-mixed signal.
The binaural cue information indicates: inter-channel level/intensity difference (IID); inter-channel coherence/correlation (ICC); inter-channel phase/delay difference (IPD); and channel prediction coefficients (CPC).
In general, the inter-channel level/intensity difference (IID) is information for controlling balance and localization of audio, and the inter-channel coherence/correlation (ICC) is information for controlling width and diffusion of audio. Both of the information are spatial parameters to help listeners to imagine auditory scenes.
The audio signals L and R that are expressed as spectrums, and the down-mixed signal M are generally sectionalized into a plurality of groups including “parameter bands”. Therefore, the binaural cue information is calculated for each of the parameter bands. Note that hereinafter the “binaural cue information” and “spatial parameter” are often used synonymously with each other.
The audio encoder unit 1150 compresses and codes the down-mixed signal M, according to, for example, MPEG Audio Layer-3 (MP3), Advanced Audio Coding (AAC), or the like.
The multiplexing unit 1190 multiplexes the down-mixed signal M and the quantized binaural cue information to generate a bitstream, and outputs the bitstream as the above-mentioned acoustic coded signals.
The multi-channel acoustic decoding unit 1200 includes an inverse-multiplexing unit 1210, an audio decoder unit 1220, an analysis filter unit 1230, a multi-channel synthesis unit 1240, and a synthesis filter unit 1290.
The inverse-multiplexing unit 1210 obtains the above-mentioned bitstream, divides the bitstream into the quantized BC information and the coded down-mixed signal M, and outputs the resulting binaural cue information and down-mixed signal M. Note that the inverse-multiplexing unit 1210 inversely quantizes the quantized binaural cue information, and outputs the resulting binaural cue information.
The audio decoder unit 1220 decodes the coded down-mixed signal M to be outputted to the analysis filter unit 1230.
The analysis filter unit 1230 converts an expression format of the down-mixed signal M into a time/frequency hybrid expression to be outputted.
The multi-channel synthesis unit 1240 obtains the down-mixed signal M from the analysis filter unit 1230, and the binaural cue information from the inverse-multiplexing unit 1210. Then, using the binaural cue information, the multi-channel synthesis unit 1240 reproduces two audio signals L and R from the down-mixed signal M to be in a time/frequency hybrid expression.
The synthesis filter unit 1290 converts the expression format of the reproduced audio signals from the time/frequency hybrid expression into a time expression, thereby outputting audio signals L and R in the time expression.
Although it has been described that the multi-channel acoustic signal processing device 1000 codes and decodes audio signals of two channels as one example, the multi-channel acoustic signal processing device 1000 is able to code and decode audio signals of more than two channels (audio signals of six channels forming 5.1-channel sound source, for example).
FIG. 2 is a block diagram showing a functional structure of the multi-channel synthesis unit 1240.
For example, in the case where the multi-channel synthesis unit 1240 divides the down-mixed signal M into audio signals of six channels, the multi-channel synthesis unit 1240 includes the first dividing unit 1241, the second dividing unit 1242, the third dividing unit 1243, the fourth dividing unit 1244, and the fifth dividing unit 1245. Note that, in the down-mixed signal M, a center audio signal C, a left-front audio signal Lf, a right-front audio signal Rf, a left-side audio signal Ls, a right-side audio signal Rs, and a low frequency audio signal LFE are down-mixed. The center audio signal C is for a loudspeaker positioned on the center front of a listener. The left-front audio signal Lf is for a loudspeaker positioned on the left front of the listener. The right-front audio signal Rf is for a loudspeaker positioned on the right front of the listener. The left-side audio signal Ls is for a loudspeaker positioned on the left side of the listener. The right-side audio signal Rs is for a loudspeaker positioned on the right side of the listener. The low frequency audio signal LFE is for a sub-woofer loudspeaker for low sound outputting.
The first dividing unit 1241 divides the down-mixed signal M into the first down-mixed signal M1 and the fourth down-mixed signal M4 in order to be outputted. In the first down-mixed signal M1, the center audio signal C, the left-front audio signal Lf, the right-front audio signal Rf, and the low frequency audio signal LFE are down-mixed. In the fourth down-mixed signal M4, the left-side audio signal Ls and the right-side audio signal Rs are down-mixed.
The second dividing unit 1242 divides the first down-mixed signal M1 into the second down-mixed signal M2 and the third down-mixed signal M3 in order to be outputted. In the second down-mixed signal M2, the left-front audio signal Lf and the right-front audio signal Rf are down-mixed. In the third down-mixed signal M3, the center audio signal C and the low frequency audio signal LFE are down-mixed.
The third dividing unit 1243 divides the second down-mixed signal M2 into the left-front audio signal Lf and the right-front audio signal Rf in order to be outputted.
The fourth dividing unit 1244 divides the third down-mixed signal M3 into the center audio signal C and the low frequency audio signal LFE in order to be outputted.
The fifth dividing unit 1245 divides the fourth down-mixed signal M4 into the left-side audio signal Ls and the right-side audio signal Rs in order to be outputted.
As described above, in the multi-channel synthesis unit 1240, each of the dividing units divides one signal into two signals using a multiple-stage method, and the multi-channel synthesis unit 1240 recursively repeats the signal dividing until the signals are eventually divided into a plurality of single audio signals.
FIG. 3 is a block diagram showing a structure of the binaural cue calculation unit 1120.
The binaural cue calculation unit 1120 includes a first level difference calculation unit 1121, a first phase difference calculation unit 1122, a first correlation calculation unit 1123, a second level difference calculation unit 1124, a second phase difference calculation unit 1125, a second correlation calculation unit 1126, a third level difference calculation unit 1127, a third phase difference calculation unit 1128, a third correlation calculation unit 1129, a fourth level difference calculation unit 1130, a fourth phase difference calculation unit 1131, a fourth correlation calculation unit 1132, a fifth level difference calculation unit 1133, a fifth phase difference calculation unit 1134, a fifth correlation calculation unit 1135, and adders 1136, 1137, 1138, and 1139.
The first level difference calculation unit 1121 calculates a level difference between the left-front audio signal Lf and the right-front audio signal Rf, and outputs the signal indicating the inter-channel level/intensity difference (IID) as the calculation result. The first phase difference calculation unit 1122 calculates a phase difference between the left-front audio signal Lf and the right-front audio signal Rf, and outputs the signal indicating the inter-channel phase/delay difference (IPD) as the calculation result. The first correlation calculation unit 1123 calculates a correlation between the left-front audio signal Lf and the right-front audio signal Rf, and outputs the signal indicating the inter-channel coherence/correlation (ICC) as the calculation result. The adder 1136 adds the left-front audio signal Lf and the right-front audio signal Rf and multiplies the resulting added value by a predetermined coefficient, thereby generating and outputting the second down-mixed signal M2.
In the same manner as described above, the second level difference calculation unit 1124, the second phase difference calculation unit 1125, and the second correlation calculation unit 1126 output signals indicating inter-channel level/intensity difference (IID), inter-channel phase/delay difference (IPD), and inter-channel coherence/correlation (ICC), respectively, regarding between the left-side audio signal Ls and the right-side audio signal Rs. The adder 1137 adds the left-side audio signal Ls and the right-side audio signal Rs and multiplies the resulting added value by a predetermined coefficient, thereby generating and outputting the third down-mixed signal M3.
In the same manner as described above, the third level difference calculation unit 1127, the third phase difference calculation unit 1128, and the third correlation calculation unit 1129 output signals indicating inter-channel level/intensity difference (IID), inter-channel phase/delay difference (IPD), and inter-channel coherence/correlation (ICC), respectively, regarding between the center audio signal C and the low frequency audio signal LFE. The adder 1138 adds the center audio signal C and the low frequency audio signal LFE and multiplies the resulting added value by a predetermined coefficient, thereby generating and outputting the fourth down-mixed signal M4.
In the same manner as described above, the fourth level difference calculation unit 1130, the fourth phase difference calculation unit 1131, and the fourth correlation calculation unit 1132 output signals indicating inter-channel level/intensity difference (IID), inter-channel phase/delay difference (IPD), and inter-channel coherence/correlation (ICC), respectively, regarding between the second down-mixed signal M2 and the third down-mixed signal M3. The adder 1139 adds the second down-mixed signal M2 and the third down-mixed signal M3 and multiplies the resulting added value by a predetermined coefficient, thereby generating and outputting the first down-mixed signal M1.
In the same manner as described above, the fifth level difference calculation unit 1133, the fifth phase difference calculation unit 1134, and the fifth correlation calculation unit 1135 output signals indicating inter-channel level/intensity difference (IID), inter-channel phase/delay difference (IPD), and inter-channel coherence/correlation (ICC), respectively, regarding between the first down-mixed signal M1 and the fourth down-mixed signal M4.
FIG. 4 is a block diagram showing a structure of the multi-channel synthesis unit 1240.
The multi-channel synthesis unit 1240 includes a pre-matrix processing unit 1251, a post-matrix processing unit 1252, a first arithmetic unit 1253, a second arithmetic unit 1255, and a decorrelated signal generation unit 1254.
Using the binaural cue information, the pre-matrix processing unit 1251 generates a matrix R1 which indicates distribution of signal intensity level for each channel.
For example, using inter-channel level/intensity difference (IID) representing a ratio of a signal intensity level of the down-mixed signal M to respective signal intensity levels of the first down-mixed signal M1, the second down-mixed signal M2, the third down-mixed signal M3, and the fourth down-mixed signal M4, the pre-matrix processing unit 1251 generates a matrix R1 including vector elements R1[0] to R1[4].
The first arithmetic unit 1253 obtains from the analysis filter unit 1230 the down-mixed signal M expressed by the time/frequency hybrid as an input signal x, and multiplies the input signal x by the matrix R1 according to the following equations 1 and 2, for example. Then, the first arithmetic unit 1253 outputs an intermediate signal v that represents the result of the above matrix operation. In other words, the first arithmetic unit 1253 separates four down-mixed signals M1 to M4 from the down-mixed signal M expressed by the time/frequency hybrid outputted from the analysis filter unit 1230.
                    v        =                              [                                                            M                                                                                                  M                    1                                                                                                                    M                    2                                                                                                                    M                    3                                                                                                                    M                    4                                                                        ]                    =                                                    [                                                                                                                              R                          1                                                ⁡                                                  [                          0                          ]                                                                                                                                                                                                  R                          1                                                ⁡                                                  [                          1                          ]                                                                                                                                                                                                  R                          1                                                ⁡                                                  [                          2                          ]                                                                                                                                                                                                  R                          1                                                ⁡                                                  [                          3                          ]                                                                                                                                                                                                  R                          1                                                ⁡                                                  [                          4                          ]                                                                                                                    ]                            ⁡                              [                M                ]                                      =                                          R                1                            ⁢              x                                                          [                  Equation          ⁢                                          ⁢          1                ]                                                      M            1                    =                                    L              f                        +                          R              f                        +            C            +            LFE                          ⁢                                  ⁢                              M            2                    =                                    L              f                        +                          R              f                                      ⁢                                  ⁢                              M            3                    =                      C            +            LFE                          ⁢                                  ⁢                              M            4                    =                                    L              s                        +                          R              s                                                          [                  Equation          ⁢                                          ⁢          2                ]            
The decorrelated signal generation unit 1254 performs all-pass filter processing on the intermediate signal v, thereby generating and outputting a decorrelated signal w according to the following equation 3. Note that factors Mrev and Mi,rev in the decorrelation signal w are signals generated by performing decorrelation processing on the down-mixed signal M and Mi. Note also that the signals Mrev and Mi,rev has the same energy as the down-mixed signal M and Mi, respectively, including reverberation that provides impression as if sounds were spread.
                    w        =                              [                                                            M                                                                                                  decorr                    ⁡                                          (                      v                      )                                                                                            ]                    =                      [                                                            M                                                                                                  M                    rev                                                                                                                    M                                          1                      ,                      rev                                                                                                                                        M                                          2                      ,                      rev                                                                                                                                        M                                          3                      ,                      rev                                                                                                                                        M                                          4                      ,                      rev                                                                                            ]                                              [                  Equation          ⁢                                          ⁢          3                ]            
FIG. 5 is a block diagram showing a structure of the decorrelated signal generation unit 1254.
The decorrelated signal generation unit 1254 includes an initial delay unit 100 and an all-pass filter D200.
In obtaining the intermediate signal v, the initial delay unit D100 delays the intermediate signal v by a predetermined time period, in other words, delays a phase, in order to output the intermediate signal v to the all-pass filter D200.
The all-pass filter D200 has all-pass characteristics that frequency-amplitude characteristics are not varied but only frequency-phase characteristics are varied, and serves as an Infinite Impulse Response (IIR).
This all-pass filter D200 includes multipliers D201 to D207, delayers D221 to D223, and adder-subtractors D211 to D223.
FIG. 6 is a graph of an impulse response of the decorrelated signal generation unit 1254.
As shown in FIG. 6, even if an impulse signal is obtained at a timing 0, the decorrelated signal generation unit 1254 delays the impulse signal not to be outputted until a timing t10, and outputs a signal as reverberation up to a timing t11 so that an amplitude of the signal is gradually decreased from the timing t10. In other words, the signals Mrev and Mi,rev outputted from the decorrelated signal generation unit 1254 represent sounds in which sounds of the down-mixed signal M and Mi are added with the reverberation.
Using the binaural cue information, the post-matrix processing unit 1252 generates a matrix R2 which indicates distribution of reverberation for each channel.
For example, the post-matrix processing unit 1252 derives a mixing coefficient Hij from the inter-channel coherence/correlation ICC which represents width and diffusion of sound, and then generates the matrix R2 including the mixing coefficient Hij.
The second arithmetic unit 1255 multiplies the decorrelated signal w by the matrix R2, and outputs an output signal y which represents the result of the matrix operation. In other words, the second arithmetic unit 1255 separates six audio signals Lf, Rf, Ls, Rs, C, and LFE from the decorrelated signal w.
For example, as shown in FIG. 2, since the left-front audio signal Lf is divided from the second down-mixed signal M2, the dividing of the left-front audio signal Lf needs the second down-mixed signal M2 and a factor M2,rev of a decorrelated signal w corresponding to the second down-mixed signal M2. Likewise, since the second down-mixed signal M2 is divided from the first down-mixed signal M1, the dividing of the second down-mixed signal M2 needs the first down-mixed signal M1 and a factor M1,rev of a decorrelated signal w corresponding to the first down-mixed signal M1.
Therefore, the left-front audio signal Lf is expressed by the following equation 4.Lf=H11,A×M2+H12,A×M2,rev M2=H11,D×M1+H12,D×M1,rev M1=H11,E×M+H12,E×Mrev  [Equation 4]Here, in the equation 4, Hij,A is a mixing coefficient in the third dividing unit 1243, Hij,D is a mixing coefficient in the second dividing unit 1242, and Hij,E is a mixing coefficient in the first dividing unit 1241. The three equations in the equation 4 are expressed together by a vector multiplication equation of the following equation 5.
                              L          f                =                              [                                          H                                  11                  ,                  A                                            ⁢                              H                                  11                  ,                  D                                            ⁢                              H                                  11                  ,                  E                                            ⁢                                                          ⁢                              H                                  11                  ,                  A                                            ⁢                              H                                  11                  ,                  D                                            ⁢                              H                                  12                  ,                  E                                            ⁢                                                          ⁢                              H                                  11                  ,                  A                                            ⁢                              H                                  12                  ,                  D                                            ⁢                                                          ⁢                              H                                  12                  ,                  A                                            ⁢                                                          ⁢              0              ⁢                                                          ⁢              0                        ]                    ⁡                      [                                                            M                                                                                                  M                    rev                                                                                                                    M                                          1                      ,                      rev                                                                                                                                        M                                          2                      ,                      rev                                                                                                                                        M                                          3                      ,                      rev                                                                                                                                        M                                          4                      ,                      rev                                                                                            ]                                              [                  Equation          ⁢                                          ⁢          5                ]            
Each of the audio signals Rf, C, LFE, Ls, and Rs other than the left-front audio signal Lf is calculated by multiplication of the above-mentioned matrix by a matrix of the decorrelated signal w. That is, an output signal y is expressed by the following equation 6.
                    y        =                              [                                                                                L                    f                                                                                                                    R                    f                                                                                                                    L                    s                                                                                                                    R                    s                                                                                                C                                                                              LFE                                                      ]                    =                                                    [                                                                                                    R                                                  2                          ,                          LF                                                                                                                                                                        R                                                  2                          ,                          RF                                                                                                                                                                        R                                                  2                          ,                          LS                                                                                                                                                                        R                                                  2                          ,                          RS                                                                                                                                                                        R                                                  2                          ,                          C                                                                                                                                                                        R                                                  2                          ,                          LFE                                                                                                                    ]                            ⁢                                                          ⁢              w                        =                                          R                2                            ⁢              w                                                          [                  Equation          ⁢                                          ⁢          6                ]            
FIG. 7 is an explanatory diagram for explaining the down-mixed signal.
The down-mixed signal is generally expressed by a time/frequency hybrid expression as shown in FIG. 7. This means that the down-mixed signal is expressed by being divided along a time axis direction into parameter sets ps which are temporal units, and further divided along a spatial axis direction into parameter bands pb which are sub-band units. Therefore, the binaural cue information is calculated for each band (ps, pb). Moreover, the pre-matrix processing unit 1251 and the post-matrix processing unit 1252 calculate a matrix R1 (ps, pb) and a matrix R2 (ps, pb), respectively, for each band (ps, pb).
FIG. 8 is a block diagram showing detailed structures of the pre-matrix processing unit 1251 and the post-matrix processing unit 1252.
The pre-matrix processing unit 1251 includes the matrix equation generation unit 1251a and the interpolation unit 1251b. 
The matrix equation generation unit 1251a generates a matrix R1 (ps, pb) for each band (ps, pb), from binaural cue information for each band (ps, pb).
The interpolation unit 1251b maps, in other words, interpolates, the matrix R1 (ps, pb) for each band (ps, pb) according to (i) a frequency high resolution time index n and (ii) a sub-sub-band index sb which is of the input signal x and in a hybrid expression. As a result, the interpolation unit 1251b generates a matrix R1 (n, sb) for each band (n, sb). As described above, the interpolation unit 1251b ensures that transition of the matrix R1 over a boundary of a plurality of bands is smooth.
The post-matrix processing unit 1252 includes a matrix equation generation unit 1252a and an interpolation unit 1252b. 
The matrix equation generation unit 1252a generates a matrix R2 (ps, pb) for each band (ps, pb), from binaural cue information for each band (ps, pb).
The interpolation unit 2252b maps, in other words, interpolates, the matrix R2 (ps, pb) for each band (ps, pb) according to (i) a frequency high resolution time index n and (ii) a sub-sub-band index sb of the input signal x of a hybrid expression. As a result, the interpolation unit 2252b generates a matrix R2 (n, sb) for each band (n, sb). As described above, the interpolation unit 2252b ensures that transition of the matrix R2 over a boundary of a plurality of bands is smooth.    [Non-Patent Document 1] J. Herre, et al., “The Reference Model Architecture for MPEG Spatial Audio Coding”, 118th AES Convention, Barcelona