As speech coding, there are mainly two types of coding technologies, that is to say, transform coding and transform coded excitation (TCX) coding (for example, Non-Patent Literature 1).
Transform coding involves, for example, a step of converting a signal from the time domain to the frequency domain using discrete Fourier transform (DFT) or modified discrete cosine transform (MDCT). Also, transform coding performs quantizing and encoding spectrum coefficients. As general transform coding, there are MPEG MP3, MPEG AAC (for example, Non-Patent Literature 2), and Dolby AC3. Transform coding is efficient for a music signal and a general speech signal. FIG. 1 shows a simplified configuration of transform coding system 10.
In an encoder of transform coding system 10 shown in FIG. 1, time-frequency conversion section 11 converts time domain signal S(n) into frequency domain signal S(f) using discrete Fourier transform (DFT), modified discrete cosine transform (MDCT), or the like. Spectrum coefficient quantizing section 12 acquires a quantized parameter by quantizing frequency domain signal S(f). Multiplexing section 13 multiplexes the quantized parameter and transmits the result to the decoder side.
In a decoder of transform coding system 10 shown in FIG. 1, demultiplexing section 14 first demultiplexes all bit stream information to generate a quantized parameter. Spectrum coefficient decoding section 15 decodes the quantized parameter to generate decoded frequency domain signal S{tilde over ( )}(f). Frequency-time conversion section 16 generates decoded time domain signal S{tilde over ( )}(n) by converting the decoded frequency domain signal S{tilde over ( )}(f), into the time domain using inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT) or the like.
By contrast with this, the combination of a time domain (linear prediction) method and a frequency domain (transform coding) method is employed in TCX coding. TCX coding acquires a residual (excitation) signal by utilizing redundancy of a speech signal in the time domain using linear prediction for an input speech signal. In the case of a speech signal, especially, in the case of an active speech section (a resonance effect and a high pitch frequency component), an audio reproducing signal is efficiently generated in this model. After linear prediction, a residual (excitation) signal is converted into the frequency domain and efficiently encoded. As general TCX coding, there are AMR-WB-E, ITU.T G.729.1, and ITU.T G.718 (for example, Non-Patent Literature 4). FIG. 2 shows a brief configuration of TCX coding system 20.
In an encoder of TCX coding system 20 shown in FIG. 2, LPC analysis section 21. performs LPC analysis for an input signal in order to utilize signal redundancy in the time domain. LPC inverse filtering section 22 acquires residual (excitation) signal Sr(n) using LPC coefficients from LPC analysis by applying a LPC inverse filter to input signal S(n). Time-frequency conversion section 23 converts residual signal Sr(n) into frequency domain signal Sr(f) using, for example, discrete Fourier transform (DFT), modified discrete cosine transform (MDCT) or the like. Spectrum coefficient quantizing section 24 quantizes frequency domain signal Sr(f), and multiplexing section 25 multiplexes a quantized parameter and transmits the result to the decoder side.
In a decoder of TCX coding system 20 shown in FIG. 2, demultiplexing section 26 first demultiplexes all bit stream information to generate a quantized parameter. Spectrum coefficient decoding section 27 decodes the quantized parameter and generates decoded frequency domain residual signal S{tilde over ( )}r(f). Frequency-time conversion section 28 generates decoded time domain signal S{tilde over ( )}r(n) by converting decoded frequency domain signal S{tilde over ( )}r(f), into the time domain using inverse discrete Fourier transform (IDFT), inverse modified discrete cosine transform (IMDCT) or the like. LPC synthesis filtering section 29 processes decoded time domain residual signal S{tilde over ( )}r(n) using the decoded LPC parameter and acquires decoded time domain signal S{tilde over ( )}(n).
Transform coding part in both transform coding and TCX coding is normally carried out by utilizing any quantizing method. One of vector quantization is referred to as pulse vector coding.
For example, Non-Patent Literature 3 discloses factorial pulse coding (one of pulse vector coding) which quantizes a LPC residual in the MDCT domain (see FIG. 4). Factorial pulse coding is one of pulse vector coding, and coding information of pulse vector coding is a unit magnitude pulse. In newly standardized speech coding ITU-T G.718, factorial pulse coding (FPC) is employed in the fifth layer for the purpose of quantizing a LPC residual in the MDCT domain.
In an encoder of TCX coding system 30 shown in FIG. 3, MDCT section 31 converts time domain signal Sr(n) into frequency domain signal Sr(f) by modified discrete cosine transform. FPC coding section 32 quantizes a LPC residual in the MDCT domain. In this encoder, a plurality of pulses, their positions, their amplitudes, and their polarities are acquired by pulse vector coding. Further, a global gain is calculated to normalize the pulses into unit magnitude. FIG. 4 shows one of configuration examples of FPC coding section 32. As shown in FIG. 4, a coding parameter of pulse vector coding is a global gain, a pulse position, a pulse amplitude, and a pulse polarity.
FIG. 5 shows a relationship between the number of pulses which can be encoded (referred to as M) and the number of spectrum coefficients of an input signal (referred to as N). As shown in FIG. 5, in the case of pulse vector coding, M representing the number of pulses which can be encoded depends on N representing the number of spectrum coefficients of an input signal, and the number of available bits. That is to say, when the number of available bits is fixed, as N is greater, M is smaller, or as N is smaller, M is greater. When N is fixed, as the number of available bits is greater, M is greater, or as the number of available bits is smaller, M is smaller.
FIG. 6 shows a concept of pulse vector coding. In input spectrum S(f) having N length, M pulses, their positions, their amplitudes, their polarities, and one global gain are together encoded. By contrast with this, in generated decoded spectrum S{tilde over ( )}(f), only M pulses, and their positions, their amplitudes, and their polarities are generated, and all of spectrum coefficients other than those are set to zero.