Generally, a stereo speech coding method or a multi-channel speech coding method include two methods.
One is the method to individually encode different channel signals, and this method can be easily applied to stereo speech signals or multi-channel speech signals. However, since this method does not delete inter-channel redundancy, the entire coding bit rate becomes proportional to the number of channels, and hence results in a higher bit rate.
The other is the method to parametrically encode a stereo speech signal or a multi-channel speech signal. The basic principle of this method is as follows. That is, at first, a coding side down-mixes or transforms an input signal into a signal of fewer channels than (or the same number as) those of the input signal. Next, the coding side encodes the down-mixed or transformed signal using the conventional speech coding method. In parallel with this, the coding side calculates inter-channel parameters representing inter-channel relationship from an original signal, encodes and then transmits the inter-channel parameters to a decoding side such that the decoding side can generate a stereo image or a multi-channel image. This method can encode inter-channel parameters with a smaller amount of coding than the amount of coding to encode a speech signal itself, thus making it possible to realize a lower bit rate.
A parametric stereo coding system or a multi-channel coding system widely use a principal component analysis (PCA) (Non-Patent Literature 1), a binaural cue coding method (BCC) (Non-Patent Literature 2), an inter-channel prediction (ICP) (Non-Patent Literature 3), and intensity stereo (IS) (Non-Patent Literature 4). The above methods generate and then transmit certain inter-channel parameters to a decoding side. For example, a binaural cue coding method (BCC) generates inter-channel level difference (ICLD), inter-channel time difference (ICTD), and inter-channel coherence (ICC) as the inter-channel parameters. Also, as inter-channel parameters, an inter-channel prediction (ICP), intensity stereo (IS), and a principal component analysis (PCA) generate an inter-channel prediction coefficient, an energy scale coefficient, and a rotation angle, respectively.
Since BCC, ICP, IS, and PCA require to obtain highly precise inter-channel parameters, it is general to calculate and encode the inter-channel parameters on a subband basis.
FIG. 1 and FIG. 2 simply illustrate configurations of parametric multi-channel codecs, and the meanings of signs in FIG. 1 and FIG. 2 are as follows.
{xi—sb}: a series of multi-channel signals divided into a plurality of subbands (which represents signals in a frequency domain, a time domain, or a hybrid domain where the frequency domain and the time domain are combined)
{yi—sb}: a series of down-mixed or transformed signals calculated every subband (which are the signals in the same domain as {xi—sb})
{Pi—sb}: a series of inter-channel parameters calculated every subband
The following will be explained assuming that down-mixing is performed.
At the coding side illustrated in FIG. 1, inter-channel parameter generating section 101 down-mixes input signals {xi—sb} by BCC, PCA or the like, and generates down-mixed signals {yi—sb} and inter-channel parameters {Pi—sb}.
Coding section 102 encodes down-mixed signal {yi—sb}, and coding section 103 (inter-channel parameter coding section), which is separately provided, encodes the inter-channel parameters {Pi—sb}.
Multiplexing section 104 multiplexes coding parameters of down-mixed signals {yi—sb} and coding parameters of inter-channel parameters {Pi—sb}, which generates a bit stream. This bit stream is transmitted to a decoding side.
At the decoding side illustrated in FIG. 2, demultiplexing section 201 demultiplexes the bit stream to obtain coding parameters of the down-mixed signals and the inter-channel parameters.
Decoding section 202 performs decoding processing using the coding parameters of the down-mixed signals, and generates decoded down-mixed signals {y{tilde over ( )}i—sb}.
Decoding section 203 (inter-channel parameter decoding section) performs decoding processing using the coding parameters of the inter-channel parameters, and generates decoded inter-channel parameters {P{tilde over ( )}i—sb}.
Inter-channel parameter applying section 204 up-mixes decoded down-mixed signals {y{tilde over ( )}i—sb} using spatial information represented by the decoded inter-channel parameters {P{tilde over ( )}i—sb}, and generates decoded signals {x{tilde over ( )}i—sb}.
Non-Patent Literature 1 describes a codec based on a principal component analysis (PCA) in the frequency domain. FIG. 3 and FIG. 4 illustrate configurations of a coding apparatus and a decoding apparatus based on PCA in Non-Patent Literature 1. The meanings of signs are as follows.
{Lsb(f)}: left signals divided into a plurality of subbands
{Rsb(f)}: right signals divided into a plurality of subbands
{Pcsb(f)}: principal-component signals calculated every subband by a principal component analysis
{Asb(f)}: ambient signals calculated every subband by a principal component analysis
{θsb}: rotation angles calculated every subband by a principal component analysis
{PcARsb}: energy ratios of principal component signals to ambient signals, the ratios calculated every subband
At a coding side illustrated in FIG. 3, principal component analyzing section 301 transforms input left signals {Lsb(f)} and input right signals {Rsb(f)} into principal-component signals {Pcsb(f)} and ambient signals {Asb(f)}. In this transforming processing, the rotation angles each representing a transform degree are calculated every subband as the following.
                    (                  Equation          ⁢                                          ⁢          1                )                                                                                  θ            sb                    =                                    1              2                        ⁢                                          tan                                  -                  1                                            (                                                2                  ⁢                                                                                                        ∑                                                  f                          =                                                                                    sb                              ⁢                                                                                                                          ⁢                              _                              ⁢                                                                                                                          ⁢                              start                                                        |                                                                                                    sb                          ⁢                                                                                                          ⁢                          _                          ⁢                                                                                                          ⁢                          end                                                                    ⁢                                                                                                    L                            sb                                                    ⁡                                                      (                            f                            )                                                                          *                                                                              R                            sb                                                    ⁡                                                      (                            f                            )                                                                                                                                                                                                                              ∑                                              f                        =                                                  sb                          ⁢                                                                                                          ⁢                          _                          ⁢                                                                                                          ⁢                          start                                                                                            sb                        ⁢                                                                                                  ⁢                        _                        ⁢                                                                                                  ⁢                        end                                                              ⁢                                                                                            L                          sb                                                ⁡                                                  (                          f                          )                                                                    2                                                        -                                                            ∑                                              f                        =                                                  sb                          ⁢                                                                                                          ⁢                          _                          ⁢                                                                                                          ⁢                          start                                                                                            sb                        ⁢                                                                                                  ⁢                        _                        ⁢                                                                                                  ⁢                        end                                                              ⁢                                                                                            R                          sb                                                ⁡                                                  (                          f                          )                                                                    2                                                                                  )                                      ⁢                                  ⁢                              θ            sb                    =                                                    θ                sb                            +                                                π                  2                                ⁢                                                                  ⁢                if                ⁢                                                                  ⁢                                  θ                  sb                                                      <            0                                              [        1        ]            
The transform of a principal component analysis is performed as the following equation.(Equation 2)Pcsb(f)=Lsb(f)*cos θsb+Rsb(f)*sin θsb Asb(f)=Rsb(f)*cos θsb−Lsb(f)*sin θsb  [2]
Monaural coding section 303 encodes principal-component signals {Pcsb(f)}.
Coding section 302 (rotation angle coding section) encodes rotation angles {θsb}.
Ambient signals {Asb(f)} are not regarded as important and thereby are not directly encoded. Energy parameter extracting section 304 calculates energy ratios {PcARsb} of principal-component signals to ambient signals, and coding section 305 (energy ratio coding section) encodes the energy ratios {PcARsb} and generates energy ratio coding parameters. The energy ratios {PcARsb} are calculated as the following equation.
                    (                  Equation          ⁢                                          ⁢          3                )                                                                      PcAR          sb                =                                            ∑                              f                =                                  sb                  ⁢                                                                          ⁢                  _                  ⁢                                                                          ⁢                  start                                                            sb                ⁢                                                                  ⁢                _                ⁢                                                                  ⁢                end                                      ⁢                                                            Pc                  sh                                ⁡                                  (                  f                  )                                            2                                                          ∑                              f                =                                  sb                  ⁢                                                                          ⁢                  _                  ⁢                                                                          ⁢                  start                                                            sb                ⁢                                                                  ⁢                _                ⁢                                                                  ⁢                end                                      ⁢                                                            A                  sb                                ⁡                                  (                  f                  )                                            2                                                          [        3        ]            
Multiplexing section 306 multiplexes coding parameters of principal-component signals {Pcsb(f)}, rotation angles {θsb}, and energy ratios {PcARsb}, and transmits a bit stream to a decoding side.
At the decoding side illustrated in FIG. 4, demultiplexing section 401 demultiplexes the bit stream, and obtains coding parameters of the principal-component signals, coding parameters of the rotation angles, and coding parameters of the energy ratios.
Decoding section 402 (rotation angle decoding section) decodes the coding parameters of the rotation angles and outputs the decoded rotation angles {θ{tilde over ( )}i—sb} to principal component combining section 406.
Monaural decoding section 403 decodes the coding parameters of the principal-component signals, generates and then outputs decoded principal-component signals {P{tilde over ( )}csb(f)} to principal component combining section 406 and ambient signal combining section 405.
Decoding section 404 (energy ratio decoding section) decodes the coding parameters of the energy ratios and generates decoded energy ratios {P{tilde over ( )}cARsb} of the principal-component signals to the ambient signals.
By scaling the decoded principal-component signals {P{tilde over ( )}csb(f)} by the decoded energy ratios, ambient signal combining section 405 generates decoded ambient signals {A{tilde over ( )}sb(f)}.
Principal component combining section 406 inversely transforms decoded principal-component signals {P{tilde over ( )}csb(f)} and decoded ambient signals {A{tilde over ( )}sb(f)} by decoded rotation angles {θ{tilde over ( )}i—sb}, and generates decoded left signals {L{tilde over ( )}sb(f)} and decoded right signals {R{tilde over ( )}sb(f)}. This inverse transformation is performed as the following equation.(Equation 4){tilde over (L)}sb(f)={tilde over (P)}csb(f)*cos {tilde over (θ)}sb−Ãsb(f)*sin {tilde over (θ)}sb {tilde over (R)}sb(f)={tilde over (P)}csb(f)*sin {tilde over (θ)}sb+Ãsb(f)*cos {tilde over (θ)}sb  [4]
In the case that the ambient signals are not encoded, the inverse transformation is performed as the following equation.(Equation 5){tilde over (L)}sb(f)={tilde over (P)}csb(f)*cos {tilde over (θ)}sb {tilde over (R)}sb(f)={tilde over (P)}csb(f)*sin {tilde over (θ)}sb  [5]