With a global drift towards broadband, expectations of users for communication systems have increased from just clarity to stereo feeling and naturalness. Accordingly, stereo acoustic sound signals have been provided as a trend. As a result, an effective encoding method has been desired for storing and transmitting stereo acoustic sound signals.
As the stereo encoding method, for example, there are a number of stereo encoding methods which adopt Mid-Side (sum-difference) (hereinafter referred to as M/S) and use the redundancy of stereo included in stereo signals, like extended adaptive multi-rate-wideband (AMR-WB+) (for example, Non-Patent Literature 1).
In M/S stereo encoding, in many cases, since a correlation between two channels is considerably high, the sum and difference between two signals (a left channel signal and a right channel signal) are computed. As a result, the redundancy of two signals is eliminated, and then a sum (monaural or mid) signal and a difference (sub or side) signal are encoded. Therefore, it is possible to allocate (relatively) more bits to the monaural signal having high energy than the side signal having low energy, and to implement high-quality stereo acoustic sound signals.
A problem of the M/S method using the redundancy of stereo acoustic sound signals is that, in a case the phases of two components are deviated from each other (one side is temporally delayed with respect to the other side), merits of the M/S encoding are lost. Since time delays frequently occur in actual audio signals, this is a fundamental matter. Also, a stereoscopic effect perceived when a stereo signal is listened depends heavily on a temporal difference between a left channel signal and a right channel signal (particularly, at a low frequency).
In order to solve this problem, in Non-Patent Literature 2, an adaptive M/S stereo encoding method in which a phase is based on a time-aligned signal component has been proposed.
FIG. 1 is a block diagram illustrating a configuration of an encoding apparatus based on a principle of an adaptive M/S stereo encoding method for stereo signals.
In an encoding process of the encoding apparatus shown in FIG. 1, time delay estimation section 101 estimates time delay D corresponding to a time delay between left channel L(n) and right channel R(n) of a stereo signal by using a time domain cross correlation technique, like equation 1.
                    Equation        ⁢                                  ⁢        1                                                                                                C              LR                        ⁡                          (              τ              )                                =                                                    (                                                      ∑                                          n                      =                      0                                                              N                      -                      1                      -                      τ                                                        ⁢                                                                          ⁢                                                            L                      ⁡                                              (                        n                        )                                                              ⁢                                          R                      ⁡                                              (                                                  n                          ⁢                                                                                                          +                          τ                                                )                                                                                            )                            2                                                      (                                                      ∑                                          n                      =                      0                                                              N                      -                      1                      -                      τ                                                        ⁢                                                                          ⁢                                                            L                      2                                        ⁡                                          (                      n                      )                                                                      )                            *                              (                                                      ∑                                          n                      =                      0                                                              N                      -                      1                      -                      τ                                                        ⁢                                                                          ⁢                                                            R                      2                                        ⁡                                          (                                              n                        +                        τ                                            )                                                                      )                                                    ⁢                                  ⁢        and        ⁢                                  ⁢                  D          =                                    C              LR                                            τ                                  arg                  ⁢                                                                          ⁢                  max                                        ⁡                          (              τ              )                                      ⁢                                  ⁢                  τ          ∈                      [                          a              ,              b                        ]                                              [        1        ]            
In equation 1, [a, b] represents a predetermined range, and N represents a frame size.
Time delay encoding section 105 encodes time delay D, and multiplexing section 106 multiplexes encoded parameters so as to form a bit stream.
Next, time alignment section 102 aligns right channel signal R(n) according to time delay D. The aligned right channel signal is denoted by Ra(n).
Down mix is performed on the aligned signal component so as to obtain monaural signal M(n) and side signal. S(n), like equation 2.
                    Equation        ⁢                                  ⁢        2                                                            {                                                                              M                  ⁡                                      (                    n                    )                                                  =                                                      L                    ⁡                                          (                      n                      )                                                        +                                                            R                      a                                        ⁡                                          (                      n                      )                                                                                                                                                                S                  ⁡                                      (                    n                    )                                                  =                                                      L                    ⁡                                          (                      n                      )                                                        -                                      R                    a                                                                                                          [        2        ]            
From equation 2, a temporally aligned signal can be generated according to equation 3.
                    Equation        ⁢                                  ⁢        3                                                            {                                                                                                  R                    a                                    ⁡                                      (                    n                    )                                                  =                                                      0.5                    *                                    ⁢                                      (                                                                  M                        ⁡                                                  (                          n                          )                                                                    -                                              S                        ⁡                                                  (                          n                          )                                                                                      )                                                                                                                                            L                  ⁡                                      (                    n                    )                                                  =                                                      0.5                    *                                    ⁢                                      (                                                                  M                        ⁡                                                  (                          n                          )                                                                    +                                              S                        ⁡                                                  (                          n                          )                                                                                      )                                                                                                          [        3        ]            
Monaural encoding section 103 encodes monaural signal M(n), and side signal encoding section 104 encodes side signal S(n). Multiplexing section 106 multiplexes the encoded parameters input from both sides of monaural encoding section 103 and side signal encoding section 104, so as to form the bit stream.
FIG. 2 is a block diagram illustrating a configuration of a decoding apparatus based on the principle of the adaptive M/S stereo encoding method for stereo signals.
In a decoding process shown in FIG. 2, de-multiplexing section 201 separates all of the encoded parameters and quantized parameters from the bit stream. Specifically, monaural decoding section 202 decodes the encoded parameters of the monaural signal so as to obtain a decoded monaural signal. Further, side signal decoding section 203 decodes the encoded parameters of the side signal so as to obtain a decoded side signal. Furthermore, time delay decoding section 204 decodes the encoded time delay so as to obtain decoded time delay D.
Next, a stereo signal is generated according to equation 4 by using the decoded monaural signal and the decoded side signal.
                    Equation        ⁢                                  ⁢        4                                                            {                                                                                                                        R                      ~                                        a                                    ⁡                                      (                    n                    )                                                  =                                  0.5                  *                                      (                                                                                            M                          ~                                                ⁡                                                  (                          n                          )                                                                    -                                                                        S                          ~                                                ⁡                                                  (                          n                          )                                                                                      )                                                                                                                                                                L                    ~                                    ⁡                                      (                    n                    )                                                  =                                  0.5                  *                                      (                                                                                            M                          ~                                                ⁡                                                  (                          n                          )                                                                    +                                                                        S                          ~                                                ⁡                                                  (                          n                          )                                                                                      )                                                                                                          [        4        ]            
where:
{tilde over (M)}(n) represents the decoded monaural signal;
{tilde over (S)}(n) represents the decoded side signal; and
{tilde over (R)}a(n) represents the input signal of time restoring section 205.
Time restoring section 205 de-aligns the phase of the input signal of time restoring section 205 in a reverse direction by using decoded time delay D, so as to obtain an output signal of time restoring section 205.