In recent years, in the MPEG audio standards, a technology called Spatial Audio Codec has been standardized. This technology aims for compression coding of multiple-channel signals for providing realistic sounding, with quite a small data amount. For example, while an Advanced Audio Coding (AAC) method, which is a multiple-channel codec widely used as an audio method for digital televisions, requires a bit-rate of 512 kbps or 384 kbps for 5.1 channels, the Spatial Audio Codec aims to achieve a quite low bit-rate of 128 kbps, 64 kbps, or further 48 kbps, in order to compress and code the multiple-channel signals (see Non-Patent Reference 1, for example).
FIG. 1 is a block diagram showing a structure of the conventional audio apparatus.
The audio apparatus 1000 includes an audio encoder 1100 and an audio decoder 1200. The audio encoder 1100 performs spatial audio coding for a group of audio signals and outputs the coded signals. The audio decoder 1200 decodes the coded signals.
The audio encoder 1100 processes audio signals (audio signals L and R of two channels, for example) in units of frames, called 1024-sample, 2048-sample, or the like. The audio encoder 1100 includes a down-mix unit 1110, a binaural cue detection unit 1120, an encoder 1150, and a multiplexing unit 1190.
The down-mix unit 1110 generates a down-mixed signal M in which audio signals L and R of two channels that are expressed as spectrums are down-mixed, by calculating an average of the audio signals L and R of two channels that are expressed as spectrums, in other words, by calculating M=(L+R)/2.
The binaural cue detection unit 1120 generates binaural cue (BC) information by comparing the down-mixed signal M and the audio signals L and R for each spectrum band. The BC information is used to reproduce the audio signals Land R from the down-mixed signal.
The BC information includes: level information IID representing inter-channel level/intensity difference; correlation information ICC representing inter-channel coherence/correlation; and phase information IPD representing inter-channel phase/delay difference.
Here, the correlation information ICC represents similarity between the two audio signals L and R. On the other hand, the level information IID represents relative intensity of the audio signals L and R. In general, the level information IID is information for controlling balance and localization of audio, and the level information IID is information for controlling width and diffusion of audio. Both of the information are spatial parameters to help listeners to imagine auditory scenes.
The audio signals L and R and the down-mixed signal M which are expressed as spectrums are generally sectionalized into a plurality of areas including “parameter bands”. Therefore, the BC information is calculated for each of the parameter bands. Note that hereinafter the “BC information” and “spatial parameter” are often used synonymously with each other.
The encoder 1150 compresses and codes the down-mixed signal M, according to, for example, MPEG Audio Layer-3 (MP3), Advanced Audio Coding (AAC), or the like.
The multiplexing unit 1190 multiplexes the down-mixed signal M and quantized BC information to generate a bitstream, and outputs the bitstream as the above-mentioned coded signals.
The audio decoder 1200 includes an inverse-multiplexing unit 1210, a decoder 1220, and a multiple-channel synthesis unit 1240.
The inverse-multiplexing unit 1210 obtains the above-mentioned bitstream, divides the bitstream into the quantized BC information and the coded down-mixed signal M, and outputs the resulting BC information and down-mixed signal M. Note that the inverse-multiplexing unit 1210 inversely quantizes the quantized BC information, and outputs the resulting BC information.
The decoder 1220 decodes the coded down-mixed signal M, and outputs the decoded down-mixed signal M to the multiple-channel synthesis unit 1240.
The multiple-channel synthesis unit 1240 obtains the down-mixed signal M from the decoder 1220, and the BC information from the inverse-multiplexing unit 1210. Then, the multiple-channel synthesis unit 1240 reproduces two audio signals L and R from the down-mixed signal M, using the BC information.
Although it has been described that the audio apparatus 1000 codes and decodes audio signals of two channels as one example, the audio apparatus 1000 is able to code and decode audio signals of more than two channels (audio signals of six channels forming 5.1-channel sound source, for example).
FIG. 2 is a block diagram showing a functional structure of the multiple-channel synthesis unit 1240.
For example, in the case where the multiple-channel synthesis unit 1240 divides the down-mixed signal M into audio signals of six channels, the multiple-channel synthesis unit 1240 includes the first dividing unit 1241, the second dividing unit 1242, the third dividing unit 1243, the fourth dividing unit 1244, and the fifth dividing unit 1244. Note that in the down-mixed signal M, a center audio signal C, a left-front audio signal Lf, a right-front audio signal Rf, a left-side audio signal L5, a right-side audio signal Rs, and a low frequency audio signal LFE are down-mixed. The center audio signal C is for a loudspeaker positioned on the center front of a listener. The left-front audio signal Lf is for a loudspeaker positioned on the left front of the listener. The right-front audio signal Rf is for a loudspeaker positioned on the right front of the listener. The left-side audio signal Ls is for a loudspeaker positioned on the left side of the listener. The right-side audio signal Rs is for a loudspeaker positioned on the right side of the listener. The low frequency audio signal LFE is for a sub-woofer loudspeaker for low sound outputting.
The first dividing unit 1241 divides the down-mixed signal M into the first down-mixed signal M1 and the fourth down-mixed signal M4 in order to be outputted. In the first down-mixed signal M1, the center audio signal C, the left-front audio signal Lf, the right-front audio signal Rf, and the low frequency audio signal LFE are down-mixed. In the fourth down-mixed signal M4, the left-side audio signal Ls and the right-side audio signal Rs are down-mixed.
The second dividing unit 1242 divides the first down-mixed signal M1 into the second down-mixed signal M2 and the third down-mixed signal M3 in order to be outputted. In the second down-mixed signal M2, the left-front audio signal Lf and the right-front audio signal Rf are down-mixed. In the third down-mixed signal M3, the center audio signal C and the low frequency audio signal LFE are down-mixed.
The third dividing unit 1243 divides the second down-mixed signal M2 into the left-front audio signal Lf and the right-front audio signal Rf in order to be outputted.
The fourth dividing unit 1244 divides the third down-mixed signal M3 into the center audio signal C and the low frequency audio signal LFE in order to be outputted.
The fifth dividing unit 1245 divides the fourth down-mixed signal M4 into the left-side audio signal Ls and the right-side audio signal Rs in order to be outputted.
As described above, in the multiple-channel synthesis unit 1240, each of the dividing units divides one signal into two signals using a multiple-stage method, and the multiple-channel synthesis unit 1240 recursively repeats the signal dividing until the signal are eventually divided into a plurality of single audio signals.
FIG. 3 is a block diagram showing another functional structure of the multiple-channel synthesis unit 1240.
The multiple-channel synthesis unit 1240 includes an all-pass filter 1261, an arithmetic unit 1262, and a Binaural Cue Coding (BCC) processing unit 1263.
The all-pass filter 1261 obtains the down-mixed signal M, generates a decorrelated signal Mrev which is not correlated with the down-mixed signal M, and outputs the decorrelated signal Mrev. Note that the down-mixed signal M and the decorrelated signal Mrev are considered to be “incoherent with each other”, if these signals are auditorily compared to each other. Note also that the decorrelated signal Mrev has the same energy as the down-mixed signal M, including finite-time reverberation components that provide auditory hallucination as if sounds were spread.
The BCC processing unit 1263 obtains the BC information, and generates a mixing coefficient Hij based on the level information IID, the correlation information ICC, and the like which are included in the BC information, and then outputs the generated mixing coefficient Hij.
The arithmetic unit 1262 obtains the down-mixed signal M, the decorrelated signal Mrev, and the mixing coefficient Hij, then performs arithmetic operation using them according to the following equation 1, and eventually outputs the audio signals L and R. As described above, using the mixing coefficient Hij, it is possible to set a degree of correlation between the audio signals L and R, and directional characteristics of the audio signals, to the desired states.L=H11×M+H12×Mrev R=H21×M+H22×Mrev  [equation 1]
FIG. 4 is a block diagram showing a more detailed structure of the multiple-channel synthesis unit 1240.
The multiple-channel synthesis unit 1240 includes a pre-matrix processing unit 1251, a post-matrix processing unit 1252, the first arithmetic unit 1253, the second arithmetic unit 1255, a decorrelater 1254, an analysis filter bank 1256, and a synthesis filter bank 1257. Note that the pre-matrix processing unit 1251, the post-matrix processing unit 125, the first arithmetic unit 1253, the second arithmetic unit 1255, and the decorrelater 1254 form a channel expansion unit 1270.
The analysis filter bank 1256 obtains the down-mixed signal M from the decoder 1220, then converts an expression format of the down-mixed signal M into a time/frequency hybrid expression, and eventually outputs the signal as the first frequency band signal x. Note that this analysis filter bank 1256 has the first stage and the second stage. For example, the first stage and the second stage are a Quadrature Mirror Filter (QMF) filter bank and a Nyquist filter bank, respectively. Regarding these stages, the QMF filter (first stage) divides a spectrum into a plurality of frequency bands, and then the Nyquist filter (second stage) divides a sub-band of low frequency into finer sub-bands, thereby improving resolution of a spectrum in the low-frequency sub-band.
The pre-matrix processing unit 1251 generates a matrix R1 using the BC information. The matrix R1 is a scaling factor that indicates scaling of signal intensity level for each channel.
For example, the pre-matrix processing unit 1251 generates the matrix R1, using the level information IID that represent a ration of a signal intensity level of the down-mixed signal M to each signal intensity level of the first down-mixed signal M1, the second down-mixed signal M2, the third down-mixed signal M3, the fourth down-mixed signal M4.
The first arithmetic unit 1253 obtains from the analysis filter bank 1256 the first frequency band signal x expressed by time/frequency hybrid, and multiplies the first frequency band signal x by the matrix R1 according to the following equations 2 and 3, for example. Then, the first arithmetic unit 1253 outputs an intermediate signal v that represents the result of the above matrix arithmetic operation. In other words, the first arithmetic unit 1253 separates four down-mixed signals M1 to M4 from the first frequency band signal x expressed by time/frequency hybrid outputted from the analysis filter bank 1256.
                    v        =                              [                                                            M                                                                                                  M                    1                                                                                                                    M                                          2                      ⁢                                                                                                                                                                                                            M                    3                                                                                                                    M                    4                                                                        ]                    =                                                    R                1                            ⁢              x                        =                                          R                1                            ⁡                              [                M                ]                                                                        [                  equation          ⁢                                          ⁢          2                ]                                                      M            1                    =                                    L              f                        +                          R              f                        +            C            +            LFE                          ⁢                                  ⁢                              M            2                    =                                    L              f                        +                          R              f                                      ⁢                                  ⁢                              M            3                    =                      C            +            LFE                          ⁢                                  ⁢                              M            4                    =                                    L              s                        +                          R              s                                                          [                  equation          ⁢                                          ⁢          3                ]            
The decorrelater 1254 has a function as the all-pass filter 1261 shown in FIG. 3, and performs all-pass filter processing for the intermediate signal v, thereby generating and outputting a decorrelated signal w according to the following equation 4. Note that factors Mrev and Mi,rev in the decorrelated signal w are signals obtained by performing decorrelation processing for the down-mixed signal M and Mi.
                    w        =                              [                                                            M                                                                                                  decorr                    ⁡                                          (                      v                      )                                                                                            ]                    =                      [                                                            M                                                                                                  M                    rev                                                                                                                    M                                          1                      ,                      rev                                                                                                                                        M                                          2                      ,                      rev                                                                                                                                        M                                          3                      ,                      rev                                                                                                                                        M                                          4                      ,                      rev                                                                                            ]                                              [                  equation          ⁢                                          ⁢          4                ]            
The post-matrix processing unit 125 generates a matrix R2 using the BC information. The matrix R2 represents scaling of reverberation for each channel. For example, the post-matrix processing unit 1252 derives the mixing coefficient Hij from the correlation information ICC which represents width and diffusion of sound, and then generates the matrix R2 including the mixing coefficient Hij.
The second arithmetic unit 1255 multiplies the decorrelated signal w by the matrix R2, and outputs an output signal y which represents the result of the matrix arithmetic operation. In other words, the second arithmetic unit 1255 separates six audio signals Lf, Rf, Ls, Rs, C, and LFE from the decorrelated signal w.
For example, as shown in FIG. 2, since the left-front audio signal Lf is divided from the second down-mixed signal M2, the dividing of the left-front audio signal Lf needs the second down-mixed signal M2 and a factor M2,rev of a decorrelated signal w corresponding to the second down-mixed signal M2. Likewise, since the second down-mixed signal M2 is divided from the first down-mixed signal M1, the dividing of the second down-mixed signal M2 needs the first down-mixed signal M1 and a factor M1,rev of a decorrelated signal w corresponding to the first down-mixed signal M1.
Therefore, the left-front audio signal Lf is expressed by the following equation 5.Lf=H11,A×M2+H11,A×M2,rev M2=H11,D×M1+H12,D×M1,rev M2=H11,E×M+H12,E×M2,rev  [equation 5]
Here, in the equation 5, Hij,A is a mixing coefficient in the third dividing unit 1243, Hij,D is a mixing coefficient in the second dividing unit 1242, and Hij,E is a mixing coefficient in the first dividing unit 1241. The three expressions in the equation 5 is able to be expressed as a single vector multiplication expression.Lf=[H11,AH11,DH11,EH11,AH11,DH12,EH11,AH12,DH12,A00]w=R2,LFw  [equation 6]
Each of the audio signals Rf, C, LFE, Ls, and Rs other than the left-front audio signal Lf is calculated by multiplication of the above-mentioned matrix by a matrix of the decorrelated signal w. That is, an output signal y is expressed by the following equation 7.
                    y        =                              [                                                                                L                    f                                                                                                                    R                    f                                                                                                                    L                    s                                                                                                                    R                    s                                                                                                C                                                                              LFE                                                      ]                    =                                                    [                                                                                                    R                                                  2                          ,                          LF                                                                                                                                                                        R                                                  2                          ,                          RF                                                                                                                                                                        R                                                  2                          ,                          LS                                                                                                                                                                        R                                                  2                          ,                          RS                                                                                                                                                                        R                                                  2                          ,                          C                                                                                                                                                                        R                                                  2                          ,                          LFE                                                                                                                    ]                            ⁢              w                        =                                          R                2                            ⁢              w                                                          [                  equation          ⁢                                          ⁢          7                ]            
The synthesis filter bank 1257 converts the expression format of each of the reproduced audio signals, from the time/frequency hybrid expression to the time expression, and then outputs the plurality of audio signals in the time expression as multiple-channel signals. Note that the synthesis filter bank 1257 includes, for example, two stages, so that the synthesis filter bank 1257 matches with the analysis filter bank 1256. Note also that the matrixes R1 and R2 are generated as matrixes R1(b) and R2(b), respectively, for each of the above-mentioned parameter bands b.
FIG. 5 is a block diagram showing a structure of the audio decoder 1200.
In FIG. 5, Note that double-lined arrows show flow of frequency band signals (the above-mentioned first frequency band signal x and output signal y) which are divided as a plurality of frequency bands.
In a coded signal obtained by the inverse-multiplexing unit 1210, (i) a coded down-mixed signal in which audio signals of six channels are down-mixed to a down-mixed signal M of two channels and coded and (ii) quantized BC information are multiplexed.
The inverse-multiplexing unit 1210 divides the coded signal into the coded down-mixed signal and the BC information. The coded down-mixed signal is coded data of two channels which is coded according to, for example, the AAC method of the MPEG standard.
The decoder 1220 decodes the coded down-mixed signal by an ACC decoder. As a result, the decoder 1220 outputs a down-mixed signal M that is a Pulse Code Modulation (PCM) signal (time-axis signal) of two channels.
The analysis filter bank 1256 has two analysis filters 1256a, each of which converts the down-mixed signal M outputted from the decoder 1220, into the first frequency band signal x.
The channel expansion unit 1270 expands the first frequency band signal x of two channels into the output signal y of six channels, using the BC information (see Patent Reference 1, for example).
The synthesis filter bank 1257 has six synthesis filters 1257a, each of which converts the output signal y outputted from the channel expansion unit 127, into an audio signal that is a PCM signal.
FIG. 6 is a block diagram showing another structure of the audio decoder 1200.
In a coded signal obtained by the inverse-multiplexing unit 1210, (i) a coded down-mixed signal in which audio signals of six channels are down-mixed to a down-mixed signal M of one channel and coded and (ii) quantized BC information are multiplexed.
In the above case, the decoder 1220 decodes the coded down-mixed signal by, for example, an ACC decoder. As a result, the decoder 1220 outputs a down-mixed signal M that is a PCM signal (time-axis signal) of one channel.
The analysis filter bank 1256 has one analysis filter 1256a which converts the down-mixed signal M outputted from the decoder 1220, into the first frequency band signal x.
The channel expansion unit 1270 expands the first frequency band signal x of one channel into the output signal y of six channels, using the BC information.    [Non-Patent Reference 1] 118th AES convention, Barcelona, Spain, 2005, Convention Paper 6447    [Patent Reference 1] Japanese Patent Application Publication No. 2004-248989