Speech compression coding methods at low bit rates have been required in order to accept an increase of subscribers in digital mobile communications such as a portable telephone, and the researches and developments have been proceeded by many research institutions. In Japan, applied coding systems as a standard system in portable telephones are VSELP at a bit rate of 11.2 kbps developed by Motorola and PSI-CELP at a bit rate of 5.6 kbps developed by NTT Mobile Communications Network. INC., and portable telephones with these system are produced
In addition, internationally, the ITU-T selected CS-ACELP, which was co-developed by Nippon Telegraph and Telephone Corporation and France Telecom, as an international standard speech coding system G.729 at 8 kbps. The system is scheduled to be used in Japan as speech coding system for portable telephones.
The above-described systems are all achieved by modifying the CELP system (Code Excited Linear Prediction: M. R. Schroeder “High Quality Speech at Low Bit Rates” described in Proc. ICASSP '85 pp. 937-940). A feature of this system is to apply a method of dividing a speech into excitation information and vocal truct information, code the excitation information with indices of a plurality of excitation samples stored in a codebook, while coding the LPC (Linear Prediction Coefficients) with respect to the vocal truct information, and perform a comparison to an input speech considering of the vocal truct information in the excitation information coding (A-b-S: Analysis by Synthesis).
The basic algorithm of the CELP system will be described using FIG. 1. FIG. 1 is a block diagram illustrating a configuration of a speech coding apparatus in the CELP system. In the speech coding apparatus illustrated in FIG. 1, LPC analyzing section 2 executes autocorrelation analysis and LPC analysis on input speech data 1 to obtain the LPC. LPC analyzing section 2 further codes the obtained LPC to obtain the coded LPC. LPC analyzing section 2 furthermore decodes the obtained coded LPC to obtain the decoded LPC.
Excitation generating section 5 fetches excitation samples stored in adaptive codebook 3 and stochastic codebook 4 (respectively referred to as an adaptive code vector (or adaptive excitation) and stochastic code vector (or stochastic excitation)) and provides respective excitation samples to LPC synthesis section 6. LPC synthesis section 6 executes filtering on two excitations obtained at excitation generating section 5 with the decoded LPC obtained at LPC analyzing section 2.
Comparing section 7 analyzes the relation of two synthesized speeches obtained at LPC synthesis section 6 and the input speech, obtains an optimal value (optimal gain) for two synthesized speeches, adds each synthesized speech respectively subjected to power adjustment with the optimal gain to obtain a total synthesized speech, and executes a distance calculation between the total synthesized speech and the input speech. Comparing section 7 further executes, with respect to all excitation samples in adaptive codebook 3 and stochastic codebook 4, the distance calculations between the input speech and each of other many synthesized speeches obtained by functioning excitation generating section 5 and LPC synthesis section 6, and obtains an index of the excitation sample whose distance is the smallest among the obtained distances. Then, comparing section 7 provides the obtained optimal gain, indices of excitation samples of respective codebooks and two excitation samples corresponding to respective index to parameter coding section 8.
Parameter coding section 8 executes coding on the optimal gain to obtain the coded gain and provides the coded gain, the coded LPC and the indices of excitation samples to transmission path 9. Further, parameter coding section 8 generates an actual excitation signal (synthesized excitation) using the coded gain and two excitations corresponding to the respective index and stores the excitation signal in adaptive codebook 3 while deleting old excitation samples.
In addition, it is general for the synthesis at LPC synthesis section 6 to use together Linear Prediction Coefficients and a high-frequency enhancement filter or a perceptual weighting filter with long-term prediction coefficients (which are obtained by the long-term prediction analysis of input speech). It is further general to execute the excitation search on the adaptive codebook and stochastic codebook at an interval (called subframe) obtained by further dividing an analysis interval.
The stochastic codebook will be described next.
The adaptive codebook is a codebook for an effective compression using a long-term correlation existing at intervals of human vocal cord vibrations, and stores previous synthesized excitations. On the contrary, the stochastic code book is a fixed codebook to reflect statistical characteristics of excitation signals. As excitation samples stored in the stochastic codebook, there are, for example, random number sequence, pulse sequence, random number sequence/pulse sequence obtained by statistic training with speech data, or pulse sequence with relatively small number of pulses generated algebraically (algebraic codebook). The algebraic codebook has been especially paid attention recently and known by that a good sound quality is obtained at bit rates such as 8 kbps with small calculation amounts.
However, an application of algebraic codebook with a small number of pulses to coding at lower bit rates introduces a phenomenon that sound qualities greatly deteriorate mainly on unvoiced consonants and background noises. On the other hand, an application of excitation with a large number of pulses such as random number sequence to coding at lower bit rates introduces a phenomenon that sound qualities greatly deteriorate mainly on voiced speeches. In order to improve the deterioration, a method with multi-codebook, in which a voiced/unvoiced judgement is performed, is examined. However, the method has the complicated processing and sometimes generates an allophone caused by a judgement error on a speech signal.
As described above, there has been no algebraic codebook which matches any effective coding on voiced speeches, unvoiced speeches and background noises. Therefore, it has been required to obtain a speech coding apparatus and a speech decoding apparatus capable of effectively coding any of voiced speeches, unvoiced speeches and background noises.