In the field of voice encoding, a voice encoder based on a Code Excited Linear Prediction (CELP) encoding model is the most widely used. Compared with voice encoders of other types, the voice encoder can obtain good voice quality in the case of a very low bit rate, and its performance is still outstanding in the case of a high bit rate. In the CELP model, as simulation of an excitation signal, an adaptive codebook and a fixed codebook play a very important role. The role of an adaptive filter is to remove long-term correlation from a voice residual signal. After the long-term correlation from the voice residual signal is removed, the voice residual signal becomes similar to a white noise. Therefore, it is difficult to effectively quantize a target signal of the fixed codebook. Currently, a method to deal with the fixed codebook is to replace the fixed codebook with an algebraic codebook.
In a voice encoder, location and symbol information about all pulses on each track is obtained through an algebraic codebook search. To effectively transmit the location and symbol information about all pulses on each track to a decoder end, the location and symbol information about these pulses must be processed properly. The processing must ensure that no location or symbol information about any pulse is lost, that is to say, the decoder end can uniquely recover the location and symbol information about all pulses. Meanwhile, to reduce the bit rate as much as possible, the processing method must ensure that the location and symbol information about the pulses are encoded by using minimum bits. The number of bits used to encode the location and symbol information about all pulses on each track in a theoretical case may be obtained through collecting statistics about permutations and combinations of all pulse locations on the same track. The number of bits used to encode the number of permutations and combinations of all pulse locations on the same track is a theoretical lower limit. When categorizing and orderly encoding the number of permutations and combinations of all pulse locations on the same track, the required number of bits may reach the theoretical lower limit if the theoretical lower limit is an integer, and may be equal to the integral part of the theoretical lower limit plus one if the theoretical lower limit is a decimal.
Assume that n pulses exist on a same track, location and symbol information about the n pulses are respectively obtained through an algebraic codebook search algorithm, and multiple pulses may exist on a same location on the same track. In this case, if the multiple pulses on the same location are encoded respectively, a lot of pulses were wasted. From the perspective of permutations and combinations, a p0 and a p1 on the same location and a p1 and a p0 on the same location belong to the same case. Therefore, to save bits as much as possible and avoid encoding a same case multiple times, collect statistics about locations with a pulse (regarding a location with multiple pulses as one pulse location), output information about pulse locations, information about the number of pulses, and information about corresponding pulse symbols, and take the pulse locations into consideration in terms of category. If n pulses exist on a same track, the number of pulse locations may be categorized through collecting statistics about the pulse locations. Assume that the number of pulse locations is m. It may be predicted that the value range of m is 1≤m≤n. For each specific value of m, calculate the number of permutations and combinations of a pulse statistical function (assuming that the total number of permutations and combinations is W), divide data within a specific range into w segments, and the corresponding number of permutations and combinations W is related to the total number of pulses on a track.
For the convenience of encoding the foregoing categories in an orderly manner, the foregoing categories are effectively combined here. An existing combination method is described as follows: 1. Calculate the number of all the foregoing permutations and combinations, calculate the ultimately required number of bits BIT, and divide the bit stream into several segments. Each segment represents a category. 2. Categories are classified according to the value of m. Generally, the number of categories is n. A first category indicates that n pulse locations persist after pulse locations are combined; a second category indicates that n−1 pulse locations persist after pulse locations are combined; the rest may be deduced by analogy. An nth category indicates that one pulse location persists after pulse locations are combined. Each category exists only in a closed range, such as [0x100000, 0x17FFFF]. 3. For a certain category, several bits (the number of bits is determined by the total number of permutations and combinations of pulse location functions) are taken out to represent a sub-category.
After the categorization processing according to the foregoing three steps, sequence the categories. The categories are sequenced in many ways, and are sequenced according to the number of pulse locations in descending order here. Finally, put each category in a different segment to finally form an ultimate index value. After receiving the ultimate index value, the decoder end performs decoding according to the foregoing sequencing order of each category, sub-category, and the number of combinations to obtain the location and symbol information about each pulse. That is, the entire encoding and decoding process is completed.
In the research of the prior art, the inventor finds that: Although the integral value of the theoretical number of bits can be used for encoding by adopting the foregoing technical solution, in the case that the number of pulses on each track is fixed, the number of bits used for any pulse combination is fixed. For example, 6 pulses exist on a track, and the theoretical value of encoding bits is 20.5637. Therefore, 21 bits need to be used for encoding. 21 bits can indicate a range from 0 to 2^21-1, while the index of 6 pulses ranges from 0 to 1549823. 547328 numerical values exist from 1549824 to 2^21−1. Therefore, about 26.1% space is wasted, and therefore, encoding bits are wasted, which results in low encoding efficiency.