1. Field of the Invention
The present invention relates generally to a speech coding system, and in particular, to a method for searching an excitation codebook.
2. Description of the Related Art
There are several types of vocoders, which compress speech signals. A vocoder typically used in a current mobile communication system is a CELP (Code Excited Linear Predictive coding) vocoder based on a liner prediction technique. The CELP vocoder is divided into a linear prediction filter for managing a linear prediction operation and a section for generating an excitation signal corresponding to an input signal from the linear prediction filter. Further, the CELP vocoder includes a pitch filter for modeling a pitch of the speech. Information on the pitch filter is collected through a so-called adaptive codebook search. A method for generating the excitation signal is classified into a method of using a created physical codebook and another method of calculating a code vector in algebra. The latter method is called “ACELP (Algebraic Code Excited Linear Predictive coding)”. In the field of speech coding, a way to search for a code vector using the above two methods is referred to as a “codebook search”. As an alternative concept of the adaptive codebook for searching for the information on the pitch filter, a codebook for searching for an excitation signal is called a “fixed codebook” or “excitation codebook”. For example, a speech coding system using a physical codebook and a linear prediction filter is disclosed in detail in U.S. Pat. Nos. 3,624,302 and 4,701,954.
The CELP technique using the physical codebook requires a large amount of memory and takes a great deal of time to search the codebook. Therefore, in most cases, the ACELP technique is used in the international standard for the vocoder. For example, a vocoder using the ACELP technique includes (i) EVRC (Enhanced Variable Rate Coding) used in a CDMA (Code Division Multiple Access) system, standardized by TIA/EIA/IS-127, EVRC and Speech Service Operation 3 for Wideband Spread Spectrum Digital Systems, and (ii) EFR (Enhanced Full Rate coding) chiefly used in a GSM (Global System for Mobile communication) mobile communication system, standardized by ESTI (European Telecommunication Standard Institute), disclosed in a paper entitled “GSM Enhanced Full Rate Speed Codec” K. Jarvinen et al. Proceedings ICASSP 1997 Intr'l Conf.
The ACELP technique segments an excitation signal applied to the pitch filter and the linear prediction filter into several subgroups, and sets a specific condition that each subgroup has a predetermined number of pulses with non-zero amplitude. Also, the ACELP technique reduces the number of multiplications by attaching a condition that the pulse has an amplitude of “+1” or “−1”, resulting in a remarkable reduction in a calculation time required for the codebook search. In addition, the ACELP technique separately codes the pulses in the respective subgroups before transmission, thereby preventing interference between the pulses in different subgroups. As a result, although a channel error occurs in several bits during transmission, the channel error affects only the pulses in the same subgroup and does not affect the pulses in the other subgroups. Thus, the ACELP technique is less susceptible to the channel environment. Compared with the ACELP technique, an LD-CELP (Low-Delay Code Excited Linear Predictive coding) technique using a stochastic codebook is susceptible to the channel error, since even a single-bit error of a codebook index affects the overall excitation signal.
A process of searching a fixed codebook for a code vector by the CELP coding in order to search for an excitation signal will now be described herein below.
The EFR or EVRC, a conventional ACELP technique, performs the code vector search process by segmenting an excitation signal with L samples into several subgroups and then searching for positions and amplitudes of a predetermined number of pulses in each subgroup in order to reduce calculations and secure insusceptibility to the channel environment. For example, as illustrated in Table 1, the EFR segments an excitation signal with L (=40) samples into 5 subgroups each having 8 samples, and searches for positions and amplitudes of a total of 10 pulses by searching for positions and amplitudes of 2 pulses in each subgroup. The positions of the pulses in the each subgroup are coded with 6 bits (i.e., 3 bits for each pulse), and the amplitudes of the pulses in each subgroup are fixed to “+1” or “−1”. Here, a sign of 2 pulses in each subgroup is coded with 1 bit. As a result, an excitation signal is coded with a total of 35 bits (i.e., 7 bits for each subgroup). Whether amplitude of the pulses is “+1” or “−1” is calculated by referring to a residual of the linear prediction filter and a residual of the pitch filter in the positions of the respective pulses.
TABLE 1SubgroupPositions00, 5, 10, 15, 20, 25, 30, 3511, 6, 11, 16, 21, 26, 31, 3622, 7, 12, 17, 22, 27, 32, 3733, 8, 13, 18, 23, 28, 33, 4244, 9, 14, 19, 24, 29, 34, 43
For the positions of the excitation pulses, it is necessary to search for a pulse position where an error, for which weighting between reference speech and synthetic speed obtained by passing positions and amplitudes of the possible pulses through a synthesis filter is taken into consideration, becomes minimized. When all of the pulse positions are taken into consideration, the number of searches becomes too large even on the assumption that the excitation signal is segmented into 5 subgroups and there are only 2 pulses in each subgroup. Therefore, the EFR uses the following suboptimal method.
It will be assumed herein that the 10 pulse positions to be searched for are (m0,m1, . . . ,m9). First, one pulse position is previously searched for in each of 5 tracks (subgroups). m0 will be situated in a position of a selected one of the 5 pulses and survive to the very end. Next, the repetitive operation is performed four times. In each repetitive operation, m1 is fixed to the previously searched pulse position in the remaining 4 tracks. The remaining 8 pulses are searched for in pairs of (m2,m3), (m4,m5), (m6,m7), and (m8,m9), respectively. At each repetition, the start points, of the 9 pulses are shifted in a circle. Therefore, the pulse pairs have different track combinations every repetition period. As a result, 2 of the 10 searched pulses belong to the 5 previously searched pulses.
It should be noted herein that the applicant is interested in the fact that the EFR does not consider the effects of the remaining pulses m4, m5, . . . , m9 when searching for positions of the pulses (m2,m3). The calculation is performed in this way, because the pulses m4, m5, . . . , m9 were not searched for yet while searching for the pulses (m2,m3). However, whether this assumption is reasonable is uncertain. Instead, there is possibility that presuming even the remaining pulse positions will attain more reasonable results.
As described above, the conventional ACELP technique uses a method of searching for the positions and amplitudes of the pulses by stages. This method, however, increases calculations, so it is not possible to securely search for a code vector having a higher cost function value than the previously searched code vector, although the codebook is searched in various ways.