When being processed, speech signal is generally framed to reduce the computational complexity of the codec and the processing delay. The speech signal remains stable in a time segment after the signal is framed, and the parameters change slowly. Therefore, the requirements such as quantization precision can be fulfilled only if the signal is processed according to the frame length in the short-term prediction for the signal. In addition, when a person utters a sound, the glottis vibrates at a certain frequency, and the frequency of the vibrate is considered as a pitch. When the pitch is short, if the selected frame length is too long, multiple different pitches may exist in one speech signal frame. Consequently, the calculated pitch is inaccurate. Therefore, a frame needs to be split into sub-frames on average.
In some lossless or lossy compression fields, to reduce the impact caused by packet loss in the network on the sound quality, the current frame needs to be independent of the previous frame. For example, the G.711 LossLess Coding (LLC) standard specifies that it is not allowed to use the data in the history buffer to predict the signal of the current frame. Therefore, the first part of the signal in current frame is used to predict the left part of the signal in current frame. If the prior art which splits the entire signal frame into several sub-frames on average is still applied, little data in the several sub-frames at the head are undergone by the Long Term Prediction (LTP) synthesis. As shown in FIG. 1, for the 8 kHz sampling rate and the 20 ms frame length, a frame is split into four sub-frames on average, and each sub-frame has 40 samples. Assuming the pitch of the first sub-frame is T[0]=34, the number of samples for synthesis through the LTP algorithm in the first sub-frame is only 40−34=6. The first 34 samples are treated as a history buffer of the subsequent sub-frames. In this way, the gain of the first sub-frame changes sharply as against the subsequent sub-frames, and the calculated gain of the first sub-frame is sharply different from that of the subsequent sub-frames, thus bringing inconvenience to subsequent processing. If T[0] is greater than the sub-frame length, even the second sub-frame is impacted.