The present invention relates to adaptive codebooks for signal generation according to indexes. More specifically, the present invention relates to speech coding techniques using communication systems or radio communication systems based on packet exchange network, particularly to adaptive codebooks used for emphasizing pitch components.
Many communication systems such as cellular communication systems or personal communication systems are based on radio channels for data communication. In such data communication, the radio channel is affected by some error sources such as multi-path fading. Such error sources may give rise to a problem of frame missing. By the term "missing" is meant total or partial destruction of the group of bits transmitted to the receiver. By the term "frame" is meant a fixed number of bits dealt with as an entity for communication in a communication system.
In the event of perfect missing of the bits of one frame, the receiver no longer has any bit for interpretation. In such an occasion, the receiver may generate a meaningless result. When the received frame is destroyed and thus unreliable, the receiver may generate an extremely distorted result. Increasing demand for the radio system capacity has given rise to the necessity of utmost utilization of the radio system bandwidth capable of being utilized. One method for improving the system bandwidth utility efficiency is to use signal compression techniques. In a radio system for transmitting a speech signal, speech compression (or speech coding) techniques may be used to this end. Such a speech coding technique is implemented by a synthesized speech coder based on analysis, such as well-known Code Excited Linear Prediction speech coder.
The problem of packet missing in a packet exchange network adopting a speech coding system is very analogous to the frame missing in the case of radio communication. Specifically, in the event of packet missing, the receiver, that is, speech decoder may no longer be able to receive frame or receive a frame with a missing of a considerable number of bits. In either case, the speech decoder presents essentially the same problem; that is, the speech decoder should synthesize speech in spite of missing of compressed speech data. Both the "frame missing" and "packet missing" concern the problem in communication channel (or network) to bring about missing of transmitted bits. In the following description, the term "frame missing" may be regarded to be a synonym of the packet missing.
A CELP speech coder uses an excitation signal codebook for coding an original speech signal. The excitation signals are used for "exciting" a linear prediction (LPC) filter for synthesizing a speech signal (or some precursor thereto). The synthesized speech signal is compared with the signal to be coded. A codebook index which is most identical with the original signal is transmitted to the CELP decoder. Communication of other type data may be made in dependence on the type of the CELP system. For the brevity of description, in the present specification the indexes and data obtained as a result of code correction or like process on the indexes are thus generally described as "index data".
In the prior art CELP coder, excitation signals are generated with a structure as shown in FIG. 3, as is well known in, for instance, "Vector Sum Excited Linear Prediction (VSELP) Speech Coding for Japan Digital Cellular", RCS90-26, (TRREDCE) Technical Research Reports of the Institute of Electronics and Data Communication Engineers of Japan.
FIG. 3 is a block diagram illustrating the excitation signal generation described in the reports, i.e., a summary of typical excitation signal generation. Referring to the Figure, a multiplier 302 adjusts the output signal level of a fixed codebook 301 by multiplying the signal by a gain Gc. Another multiplier 304 adjusts the output signal level of an adaptive codebook 303 by multiplying the signal by a gain Gp. An adder 305 adds together the two level adjusted signals to generate an excitation signal. The excitation signal thus generated is fed back to the adaptive codebook to realize reproduction of the pitch lag of speech. Generally, the transfer function of the adaptive codebook is given as: EQU P1(z)=GpZ.sup.-P,
where p is the group delay, i.e., pitch lag. In the excitation signal generation, the CELP speech coder makes a retrieval for the best identical index to the input speech signal. In FIG. 3, the best identical indexes in the current frame are labeled I fcb curr and I acb curr, and the gains obtained as a result of conversion of the indexes I Gc curr and I Gp curr concerning the gain are labeled Gc curr and Gp curr. The CELP speech decoder receives the most identical data from the CELP speech coder and, like the coder, generates an excitation signal. However, generation of an error in the transmission line due to multi-pulse fading or the like, results in frame missing and deterioration of the speech quality.
Heretofore, "a method of improving the performance of coding systems" which is disclosed in Japanese Laid-Open Patent Publication 8-227300, has been well known as a method of improving the performance of coding systems against frame missing.
FIG. 4 shows a prior art radio communication system disclosed in this Laid-Open Patent Publication.
Referring to the Figure, the illustrated ratio communication system comprises a G.728 speech coder 401, a decoder pre-processor 403 and a G.728 speech decoder 404.
The G.728 speech coder 401 codes input speech, and transmits coded speech signal thus obtained to a communication channel 402. The coded speech signal is affected by some error sources such as multi-fading as it is passed through the communication channel 402, and received as coded speech signal with frame missing in the decoder pre-processor 403. The decoder pre-processor 403 "decodes" missing-frame-free coded speech signal in a range necessary for the generation of an excitation signal which is also generated in the coder. When frame missing is recognized, a "decoded" excitation signal of the preceding frame is externally inserted throughout the period of the missing frame. The externally inserted excitation signal is coded by using the best codebook identity that can be utilized, and is made so by executing a series of codebook "retrievals". Particularly, a codebook vector which is most identical with each vector of the externally inserted excitation signal is selected. The pre-processor discriminates the index that represents the best codebook, and generates the coded speech signal based on this index. Using this correction signal, the decoder can approximate the externally inserted excitation signal from the pre-processor, thus minimizing the advantages of destroyed frames in the reconstituted speech signal.
FIG. 5 is a flow chart concerning the operation of the decoder pre-processor. In this example, the CELP speech coder is used, and a target signal is selected as being an excitation signal, which is constituted by external insertion of an excitation signal represented by coded signal corresponding to the preceding frame. The pre-processor "decodes" missing-frame-free coded speech signal in a range necessary for the excitation signal generation. In other words, it executes the same codebook lookup as executed in the excitation signal generator 405 in the decoder. This means that the pre-processor 403 includes the same codebook as that present in both the coder and decoder. When a missing frame is recognized, the pre-processor 403 externally inserts the decoded excitation signal corresponding to the preceding frame inserted in the missing frame period. Subsequently, the (best identical) codebook index representing the externally inserted excitation signal is generated by executing codebook retrieval.
With reference to FIG. 4, the pre-processor 403, receiving each frame from the communication channel 402 (step 500), checks whether the coded speech signal corresponding to the received frame has been destroyed (step 501). The check may be made by using a usual error detection signal. When the pre-processor 403 determines that the given frame has not been destroyed (step 502), it supplies the coded speech signal without correction to the decoder 404 (step 503). The pre-processor 403 executes codebook lookup for each codebook index contained in the given frame and, as a result, generates and stores an excitation signal (step 504). This process is essentially the same as executed by the excitation signal generator 405 in the decoder 404 shown in FIG. 3. The stored data is preserved for being used in the next frame process (when it is found that the next frame is a missing frame).
When the pre-processor 403 recognizes in the step 502 that the given frame has been destroyed, it executes the steps 505 to 507. In the step 505, the pre-processor 403 corrects the coded speech signal. Specifically, in this step the pre-processor 403 executes external insertion of the excitation signal of the preceding frame (i.e., the signal decoded and stored in the step 500) as a corrected signal corresponding to the pertinent frame.
In the next step 506, the pre-processor 403 executes the "coding" of the externally inserted excitation signal. Specifically, the pre-processor 403 executes codebook retrieval for the best identical codebook entry with the externally inserted signal. Codebook is retrieved for each vector of the missing frame and the entry which is the best identical with the part corresponding to the externally inserted excitation signal. The reference of the best identity may be based on the mean square error measure or other error references well known to the person skilled in the art.
Finally, in the step 507 the pre-processor 403 replaces the missing frame part of the coded speech signal with the codebook index generated in the step 506. Using this codebook index; the decoder can generate an excitation signal which approximates the externally inserted excitation signal generated in the step 505, thus permitting improvement of the performance of the coding system. After the pre-processor 403 has transmitted the coded speech signal to the decoder in the step 503 (and generated the excitation signal in the step 504), or after it has corrected the coded speech signal in the steps 505 to 507, the control routine returns to the step 500 to receive the next frame.
In the technique as described above, in the event of the occurrence of a transmission line error on the communication channel, the internal states of the adaptive codebooks of the coder and decoder may fail to be identical. The occurrence of such identify failure may result in abnormal sound generation and deterioration of the speech quality when the decoder executes decoding by receiving the index transmitted from the coder, even though retrieval for the best identical index is made on the coder side.
This is so because of the fact that the adaptive codebook has a feedback constitution that an adaptive codebook is generated by using the excitation signal of the preceding frame. Due to an error occurring during voiced speech, the internal state of the adaptive codebook of the decoder becomes different from that of the adaptive codebook of the coder. When the signal level is reduced in such a case as when a non-voice state is brought about, the signal level of the adaptive codebook internal state is also reduced, so that an error occurring on the transmission line of course has less adverse advantages. An error occurring on the transmission line during a voiced speech signal period, however, has advantages continuous to a non-voice period due to feedback loop. During the period until the non-voice period sets in after occurrence of a transmission line error, the index combination may lead to generation of abnormal noise and extreme deterioration of the speech quality.