The present invention relates to a speech packet transmission system and, more particularly, to a high efficiency speech packet transmission system for allowing a speech coder/decoder, which is applicable to a telephone line switching network using high-speed digital dedicated lines, to code speech signals with a high efficiency speech coding system and send only the voiced portions of the signals in the form of packets.
To send only the voiced portions of speech coded data in the form of packets, it is a common practice to code a linear speech signal by PCM (Pulse Code Modulation) using A-Law or .mu.-Law. The PCM coding scheme protects the speeches of voiced portions from deterioration despite that the coded data of unvoiced portions are not sent. This is because the A-Law/.mu.-Law coded data and the PCM linear signal correspond one-to-one to each other, i.e., one sample of the PCM linear signal can be decoded without being effected by past coded data.
By contrast, a high efficiency speech coding system, e.g., 16 kbps LD-CELP (Low-Delay Code Excited Linear Prediction) system prescribed by ITU-T Recommendation G. 728 codes and decodes speeches by backward linear prediction or similar scheme using past input signals. The precondition with the high efficiency speech coding system is that a decoder decodes incoming continuous coded data while maintaining exactly the same conditions as a coder thereinside. Also, the coder codes speeches while predicting decoded signals by analyzing past input signals.
Assume that the above high efficiency speech coding system is applied to the speech packet transmission system which sends only the voiced portions as coded data. Then, the inside conditions, particularly prediction coefficients, of the coder and those of the decoder do not coincide with each other in unvoiced portions in which no coded data are sent. As a result, speech quality is lowered at the leading end of each voiced period. To solve this problem, the coder and decoder may both be initialized, or reset, during unvoiced periods so as to have their inside conditions forcibly brought into coincidence, as taught in Japanese Patent Laid-Open Publication No. 2-181552 (Prior Art 1). Alternatively, a delay circuit may be inserted in the input side of the coder in order to handle even a part preceding each voiced portion as the voiced portion (Prior Art 2). With the delay circuit scheme, it is possible to drive both the coder and the decoder at the unvoiced portion preceding each voiced portion. Consequently, the inside conditions of the coder and those of the decoder are caused to coincide before the beginning of the actual voiced portion, preventing speech quality from being deteriorated at the leading edge of a speech.
Prior Art 1 cannot fully obviate the deterioration of speech quality at the leading end of a speech, as follows. By the initialization, both the coder and the decoder are reset to their inside conditions which will occur under the continuous receipt of a full unvoiced signal. In unvoiced compression processing, on the transition from an unvoiced portion to a voiced portion, a speech signal with power great enough for a speech detector to detect a voiced portion is input to the coder at the leading portion of the voiced portion. When a speech signal representative of the discontinuous voiced portion and different from the unvoiced signal condition is input to the coder or the decoder forcibly reset, the coder or the decoder deals with the input signal with linear prediction coefficients assigned to the full unvoiced condition. This lowers speech quality at the leading end of a speech, and moreover causes the linear prediction to fail due to the discontinuity of the internal conditions and input signal and results in unexpected sound.
In the LD-CELP system prescribed by ITU-T Recommendation G. 728, a linear predictor is driven once for 2.5 ms by using a past 13.125 ms signal stored in a buffer and representative of 105 samples. As a result, prediction coefficients are updated at the above period. Further, when the linear prediction computation becomes faulty, the linear predictor ends the processing halfway with the result that the prediction coefficients are not updated at all.
Therefore, Prior Art 2 is not practicable without resorting to the following process. First, all the 105 samples stored in the buffer for linear prediction are updated in order to render all the signals in the buffer continuous, so that the linear predictor can perform its normal processing. Subsequently, while the linear predictor repeats the normal processing a plurality of times, the linear prediction coefficients of the coder and those of the decoder are caused to gradually converge to each other. A period of time as long as 20 ms to 60 ms is necessary for the prediction coefficients to converge to a degree sufficient to obviate the deterioration of speech quality at the leading end of a speech and the unexpected sound. Such a great delay undesirably increases the delay of the entire speech packet transmission section. Moreover, the part of the unvoiced period preceding the actual voiced period is also dealt with as the voiced period and sent together with the actual voiced period. Consequently, the part of the unvoiced portion preceding the voiced portion, i.e., the inserted delay lowers an unvoiced compression effect available with the system which does not send unvoiced periods, i.e., the ratio of voiced to the total packets appeared for a given period of time.