In transmission of a coded and packetized audio signal through an Internet network with an IP (Internet Protocol) phone, a packet can be lost because of a network congestion or the like (this phenomenon will be referred to hereinafter as “packet loss”). With an occurrence of a packet loss, necessary audio codes are lost resulting in a failure in decoding of audio, thereby causing an audio discontinuity. A technology for preventing an audio discontinuity caused by a packet loss is an audio packet loss concealment technology. The audio packet loss concealment technology is designed to detect a packet loss and generate a pseudo audio signal corresponding to the lost packet (which will be referred to hereinafter as “concealment signal”).
When an audio encoding technique used is a technique of performing audio encoding while updating internal states of encoder/decoder, encoding parameters to be originally received are not obtained and thus the audio packet loss concealment technology includes performing an update of the internal states of the decoder by use of artificially-generated parameters as well.
The CELP (Code Excited Linear Prediction) encoding is widely used as a technique for performing the audio encoding while updating the internal states of encoder/decoder. In the CELP encoding, an autoregressive model is assumed, and an excitation signal e(n) is filtered by an all-pole synthesis filter a(i) to synthesize an audio signal. Namely, the audio signal s(n) is synthesized according to the below equation. In the equation below, a(i) represents linear prediction coefficients (LP (Linear Prediction) coefficients) and the degree to be used is a value such as P=16.
                              s          ⁡                      (            n            )                          =                              e            ⁡                          (              n              )                                -                                    ∑                              i                =                1                            P                        ⁢                                                  ⁢                                          a                ⁡                                  (                  i                  )                                            ·                              s                ⁡                                  (                                      n                    -                    i                                    )                                                                                        [                  Mathematical          ⁢                                          ⁢          Equation          ⁢                                          ⁢          1                ]            
In the CELP encoding, the internal states stored include ISF (Immittance Spectral Frequency) parameters as mathematically equivalent representation of the linear prediction coefficients, and a past excitation signal. With an occurrence of a packet loss, these are artificially generated, and there arises a deviation from the original parameters that would be obtained by decoding. An inconsistency of a synthesized audio caused by a deviation of the parameters is perceived as a noise by a listener, which significantly degrades the subjective quality.
The paragraphs below will describe a configuration and an operation of an audio decoder to perform the audio packet loss concealment, using an example where the CELP encoding is used as the audio encoding technique.
A configuration diagram and an operation of the audio decoder are shown in FIG. 1 and FIG. 2. As shown in FIG. 1, an audio decoder 1 has a packet loss detector 11, an audio code decoder 12, a concealment signal generator 13, and an internal state buffer 14.
The packet loss detector 11, when receiving an audio packet correctly, sends a control signal, and audio codes included in the audio packet, to the audio code decoder 12 (normal reception: YES in step S100 in FIG. 2). Thereafter, the audio code decoder 12 performs decoding of the audio codes and updating of the internal states as described below (steps S200 and S400 in FIG. 2). On the other hand, the packet loss detector 11, when failing to receive an audio packet correctly, sends a control signal to the concealment signal generator 13 (packet loss: NO in step S100 in FIG. 2). Thereafter, the concealment signal generator 13 generates a concealment signal and updates the internal states as described below (steps S300 and S400 in FIG. 2). The processes of steps S100 to S400 in FIG. 2 are repeated to the end of communication (or until step S500 results in a determination of YES).
The audio codes include at least encoded ISF parameters{dot over (ω)}i,  [Mathematical Equation 2](Equation 2 is incomplete)encoded pitch lags Tjp of the first to fourth subframes, encoded adaptive codebook gains gjp of the first to fourth subframes, encoded fixed codebook gains gjc of the first to fourth subframes, and encoded fixed codebook vectors cj(n) of the first to fourth subframes. The ISF parameters may be replaced by LSF (line spectral frequency) parameters which are mathematically equivalent representation thereof. Although the discussion below uses the ISF parameters, the same discussion may also be true for the case using the LSF parameters.
The internal state buffer includes past ISF parameters{dot over (ω)}i−1  [Mathematical Equation 3]and, as equivalent representation of{dot over (ω)}i−1,  [Mathematical Equation 4]ISP (Immittance Spectral Pair) Parameters{dot over (q)}i−1,  [Mathematical Equation 5]ISF Residual Parameters{dot over (r)}i−1,  [Mathematical Equation 6]past pitch lags Tjp, past adaptive codebook gains gjp, past fixed codebook gains gjc, and an adaptive codebook u(n). It is determined, depending upon a design principle, how many subframes of the past parameters should be included. It is assumed in the present specification that one frame includes four subframes, but another value may be adopted depending upon the design principle.
<Case of Normal Reception>
FIG. 3 shows an exemplary functional configuration of the audio code decoder 12. As shown in this FIG. 3, the audio code decoder 12 has an ISF decoder 120, a stability processor 121, an LP coefficient calculator 122, an adaptive codebook calculator 123, a fixed codebook decoder 124, a gain decoder 125, an excitation vector synthesizer 126, a post-filter 127, and a synthesis filter 128. It should be noted, however, that the post-filter 127 is not an indispensable constitutive element. In FIG. 3, for convenience of explanation, the internal state buffer 14 is indicated by a double-dot line inside the audio code decoder 12. However, the internal state buffer 14 is not included inside the audio code decoder 12, but is indeed the internal state buffer 14 itself shown in FIG. 1. The same is also true in the configuration diagrams of the audio code decoder hereinafter.
A configuration diagram of the LP coefficient calculator 122 is shown in FIG. 4 and a processing flow of calculation of LP coefficients from the encoded ISF parameters is shown in FIG. 5. As shown in FIG. 4, the LP coefficient calculator 122 has an ISF-ISP converter 122A, an ISP interpolator 122B, and an ISP-LPC converter 122C.
First described are a functional configuration and its operation associated with the process of calculating the LP coefficients from the encoded ISF parameters (FIG. 5).
The ISF decoder 120 decodes the encoded ISF parameters to obtain the ISF residual parameters{dot over (r)}i0  [Mathematical Equation 7]and calculates the ISF parameters{dot over (ω)}i  [Mathematical Equation 8]in accordance with the following equation (step S1 in FIG. 5). Here, meani represents mean vectors obtained in advance by learning or the like.
                                          ω            .                    i                =                              mean            i                    +                                    r              .                        i            0                    +                                    1              3                        ⁢                                          r                .                            i                              -                1                                                                        [                  Mathematical          ⁢                                          ⁢          Equation          ⁢                                          ⁢          9                ]            
The example of using an MA prediction for the calculation of the ISF parameters is described herein, but it is also possible to adopt a configuration to perform calculation of the ISF parameters using an AR prediction as described below. Here, the ISF parameters of the immediately preceding frame are denoted by{dot over (ω)}i−1  [Mathematical Equation 10]and weight factors of the AR prediction by ρi.{dot over (ω)}i=meani+ρi({dot over (ω)}i−1−meani)  [Mathematical Equation 11]
The stability processor 121 performs a process according to the below equation so as to place a distance of not less than 50 Hz between elements of the ISF parameters in order to secure stability of the filter (step S2 in FIG. 5). The ISF parameters are indicative of a line spectrum representing the shape of an audio spectrum envelope, and as the distance between them becomes shorter, peaks of the spectrum become larger, causing resonance. For this reason, the process for securing stability becomes necessary to prevent gains from becoming too large at the peaks of the spectrum. Here, min_dist represents a minimum ISF distance, and isf_min represents a minimum of ISF necessary for securing the distance of min_dist. isf_min is successively updated by adding the distance of min_dist to a value of neighboring ISF. On the other hand, isf_max represents a maximum of ISF necessary for securing the distance of min_dist. isf_max is successively updated by subtracting the distance of min_dist from a value of neighboring ISF.isf_min=min_dist=50for i=0 to 14.if {dot over (ω)}i<isf_min then {dot over (ω)}i=isf_minisf_min={dot over (ω)}i+min_distisf_max=6400−min_distif {dot over (ω)}14>isf_maxfor i=14 down to 1if {dot over (ω)}i>isf_max then {dot over (ω)}i=isf_maxisf_max={dot over (ω)}i−min_dist  [Mathematical Equation 12]
The ISF-ISP converter 122A in the LP coefficient calculator 122 converts{dot over (ω)}i  [Mathematical Equation 13]into ISP parameters{dot over (q)}i  [Mathematical Equation 14]in accordance with the following equation (step S3 in FIG. 5). Here, C is a constant determined in advance.{dot over (q)}i=cos(C·{dot over (ω)}i)  [Mathematical Equation 15]
The ISP interpolator 122B calculates the ISP parameters for the respective subframes from the past ISP parameters{dot over (q)}i−1  [Mathematical Equation 16]included in the internal state buffer 14 and the foregoing ISP parameters{dot over (q)}i  [Mathematical Equation 17]in accordance with the below equation (step S4 in FIG. 5). Other coefficients may be used for the interpolation.qi(1)=0.75·{dot over (q)}i−1+0.25·{dot over (q)}i qi(2)=0.5·{dot over (q)}i−1+0.5·{dot over (q)}i qi(3)=0.25·{dot over (q)}i−1+0.75·{dot over (q)}i qi(4)={dot over (q)}i  [Mathematical Equation 18]
The ISP-LPC converter 122C converts the ISP parameters for the respective subframes into LP coefficients{dot over (a)}ij (0<i≤P,0≤j<4  [Mathematical Equation 19](step S5 in FIG. 5). A specific conversion procedure to be used can be the processing procedure described in Non Patent Literature 1. The number of subframes included in a look-ahead signal is assumed to be 4 herein, but the number of subframes may differ, depending upon the design principle.
Next described are other configurations and operations in the audio code decoder 12.
The adaptive codebook calculator 123 decodes encoded pitch lags to calculate the pitch lags TjP of the first to fourth subframes. Then, the adaptive codebook calculator 123 uses the adaptive codebook u(n) to calculate adaptive codebook vectors for the respective subframes in accordance with the below equation. The adaptive codebook vectors are calculated by interpolating the adaptive codebook u(n) by a FIR filter Int(i). Here, the length of the adaptive codebook is denoted by Nadapt. The filter Int(i) used for the interpolation is an FIR filter with a predetermined length 2l+1, and L′ presents the sample number of the subframes. By using the interpolation filter Int(i), the pitch lags can be utilized to the accuracy of decimal places. For the details of the interpolation filter, the method described in Non Patent Literature 1 can be referred to.
                                          v            j                    ⁡                      (            n            )                          =                              ∑                          i              =                              -                l                                      l                    ⁢                                                    Int                ⁡                                  (                  i                  )                                            ·                              u                ⁡                                  (                                      n                    +                                          N                      adapt                                        -                                                                  T                        ^                                            p                                              (                        j                        )                                                              +                    i                                    )                                                      ⁢                          (                              0                ≤                n                <                                  L                  ′                                            )                                                          [                  Mathematical          ⁢                                          ⁢          Equation          ⁢                                          ⁢          20                ]            
The fixed codebook decoder 124 decodes the encoded fixed codebook vectors to acquire the fixed codebook vectors cj(n) of the first to fourth subframes.
The gain decoder 125 decodes the encoded adaptive codebook gains and the encoded fixed codebook gains to acquire the adaptive codebook gains and fixed codebook gains of the first to fourth subframes. For example, the decoding of the adaptive codebook gains and the fixed codebook gains can be carried out, for example, by the below technique described in Non Patent Literature 1. Since the below technique described in Non Patent Literature 1 does not use the interframe prediction as used in gain encoding of AMR-WB, it can enhance packet loss resistance.
For example, the gain decoder 125 acquires the fixed codebook gain in accordance with the below processing flow.
First, the gain decoder 125 calculates the power of the fixed codebook vector. Here, the length of the subframe is defined as Ns.
                              E          c                =                  10          ⁢                                          ⁢                      log            ⁡                          (                                                1                                      N                    s                                                  ⁢                                                      ∑                                          i                      =                      0                                                                                      N                        s                                            -                      1                                                        ⁢                                                                          ⁢                                                            c                      2                                        ⁡                                          (                      i                      )                                                                                  )                                                          [                  Mathematical          ⁢                                          ⁢          Equation          ⁢                                          ⁢          21                ]            
Next, the gain decoder 125 decodes the vector-quantized gain parameter to acquire the adaptive codebook gainĝp  [Mathematical Equation 22]and the quantized fixed codebook gainÊi.  [Mathematical Equation 23]
It then calculates a predictive fixed codebook gain as described below from the quantized fixed codebook gain and the aforementioned power of the fixed codebook vector.gc′=100.05(Êi−Ec)  [Mathematical Equation 24]
Finally, the gain decoder 125 decodes the prediction coefficient{circumflex over (γ)}  [Mathematical Equation 25]and multiplies it to the prediction gain to acquire the fixed codebook gain.ĝc={circumflex over (γ)}·gc′  [Mathematical Equation 26]
The excitation vector synthesizer 126 multiplies the adaptive codebook vector by the adaptive codebook gain and multiplies the fixed codebook vector by the fixed codebook gain and calculates a sum of them to acquire an excitation signal, as expressed by the following equation.ej(n)=gpj·νj(n)+gcj·cj(n)  [Mathematical Equation 27]
The post-filter 127 subjects the excitation signal vectors, for example, to post-processes such as processes of pitch enhancement, noise enhancement, and low-frequency enhancement. The pitch enhancement, the noise enhancement, and the low-frequency enhancement can be effected by use of the techniques described in Non Patent Literature 1.
The synthesis filter 128 synthesizes a decoded signal with the excitation signal as a drive audio source, by linear prediction inverse filtering.
                                          s            ^                    ⁡                      (            n            )                          =                                            e              j                        ⁡                          (              n              )                                -                                    ∑                              i                =                1                            P                        ⁢                                                            a                  .                                i                j                            ·                                                s                  ^                                ⁡                                  (                                      n                    -                    i                                    )                                                                                        [                  Mathematical          ⁢                                          ⁢          Equation          ⁢                                          ⁢          28                ]            
If a pre-emphasis is done in the encoder, a de-emphasis is carried out.ŝde-emph(n)=ŝ(n)+β·ŝ(n−1)  [Mathematical Equation 29]
On the other hand, if a pre-emphasis is not done in the encoder, a de-emphasis is not carried out.
The paragraphs below will describe the operation concerning an internal state update.
In order to interpolate parameter upon an occurrence of packet loss, the LP coefficient calculator 122 updates the internal states of the ISF parameters by vectors calculated by the following equation.
                                          ω            ->                    i                =                              βω            i            C                    +                                    (                              1                -                β                            )                        ⁢                                                            ω                  i                                      (                                          -                      3                                        )                                                  +                                  ω                  i                                      (                                          -                      2                                        )                                                  +                                  ω                  i                                      (                                          -                      1                                        )                                                              3                                                          [                  Mathematical          ⁢                                          ⁢          Equation          ⁢                                          ⁢          30                ]            
Here, ωi(−j) represents the ISF parameters j frames prior, which are stored in the buffer. ωiC represents the ISF parameters in speech intervals obtained in advance by learning or the like. β is a constant and can be a value of, e.g., 0.75, to which the value is not necessarily limited. ωiC and β may be varied by an index to express a property of an encoding target frame, for example, as in the ISF concealment described in Non Patent Literature 1.
Furthermore, the LP coefficient calculator 122 also updates the internal states of the ISF residual parameters in accordance with the following equation.{dot over (r)}i−1={dot over (r)}i0  [Mathematical Equation 31]
The excitation vector synthesizer 126 updates the internal states by the excitation signal vectors in accordance with the below equation.u(n)=u(n+L) (0≤n<N−L)u(n+N−L+jL′)=ej(n) (0≤n<L′)  [Mathematical Equation 32]
Furthermore, the excitation vector synthesizer 126 updates the internal states of the gain parameters by the following equation.gc(−Mla+j)=gcj  [Mathematical Equation 33]
The adaptive codebook calculator 123 updates the internal states of the parameters of the pitch lags by the following equation.Tp(−Mla+j)=Tpj  [Mathematical Equation 34]The range of j is defined as (−2≤j<Mla) but different values may be selected as the range of j, depending upon the design principle.
<Case of Packet Loss>
FIG. 6 shows an exemplary functional configuration of the concealment signal generator 13. As shown in this FIG. 6, the concealment signal generator 13 has an LP coefficient interpolator 130, a pitch lag interpolator 131, a gain interpolator 132, a noise signal generator 133, a post-filter 134, a synthesis filter 135, an adaptive codebook calculator 136, and an excitation vector synthesizer 137. It should be noted, however, that the post-filter 134 is not an indispensable constitutive element.
The LP coefficient interpolator 130 calculates{dot over (ω)}i  [Mathematical Equation 35]by the following equation. In this respect, ωi(−j) represents the ISF parameters j frames prior, which are stored in the buffer.{dot over (ω)}i=αωi(−1)+(1−α){right arrow over (ω)}i  [Mathematical Equation 36]
In this equation,{right arrow over (ω)}i  [Mathematical Equation 37]represents the internal states of the ISF parameters calculated upon normal reception of a packet. α is also a constant and can be a value of, e.g., 0.9 to which the value is not necessarily limited. α may be varied by an index to express a property of an encoding target frame, for example, as in the ISF concealment described in Non Patent Literature 1.
The procedure of obtaining the LP coefficients from the ISF parameters is the same as performed in the case of normal reception of a packet.
The pitch lag interpolator 131 uses the internal state parameters about the pitch lagsTp(−Mla+j)  [Mathematical Equation 38]to calculate predictive values of the pitch lags{circumflex over (T)}p.  [Mathematical Equation 39]A specific processing procedure to be used can be the technique disclosed in Non Patent Literature 1.
In order to interpolate the fixed codebook gains, the gain interpolator 132 can use the technique according to the below equation as described in Non Patent Literature 1.gs=0.4·gc−1+0.3·gc−2+0.2·gc−3+0.1·gc−4  [Mathematical Equation 40]
The noise signal generator 133 generates white noise for the same length as the fixed codebook vectors and uses the resultant noise for the fixed codebook vectors.
The operations of the post-filter 134, the synthesis filter 135, the adaptive codebook calculator 136, and the excitation vector synthesizer 137 are the same as those in the aforementioned case of normal reception of a packet.
The internal state update is the same as performed in the case of normal reception of a packet, except for an update of the ISF residual parameters. The updating of the ISF parameters is carried out in accordance with the following equation by the LP coefficient interpolator 130.
                                          r            .                    i          0                =                                            ω              .                        i            0                    -                      mean            i                    -                                    1              3                        ⁢                                          r                .                            i                              -                1                                                                        [                  Mathematical          ⁢                                          ⁢          Equation          ⁢                                          ⁢          41                ]            