Speech compression in traditional voice communication systems provides more efficient use of bandwidth than sending uncompressed signals, because the systems are enabled to transfer more data within the same bandwidth allocation. Speech compression is a technique for representing an analog speech signal in digital format with as few bits as possible, while preserving signal quality. The number of bits used to represent the speech signal directly affects the bit rate of the encoder, with higher bit rates requiring more bandwidth. Thus, a lower encoding bit rate would generally result in a more efficient use of bandwidth.
Furthermore, speech encoders and decoders typically operate under time constraints in which the compression/decompression should occur. Thus, the goals in the design of a speech codec (coder/decoder) are generally to minimize the bit rate of the encoded speech signal, while reducing the complexity of the speech compression algorithms and minimizing delay in encoding. Although a design seeks to reduce complexity and minimize delay, another goal is to preserve the speech quality of the signal.
Speech compression standards are often used as design guides in designing speech codecs, because many of the above issues have been contemplated in the standards, and codecs implementing the standards may be interoperable with other devices supporting the standards. The speech compression standards may set forth a bit allocation scheme for the encoder, such as defining a frame of data with certain bit positions within the frame having a standard meaning. Such frames may be subdivided into two or more subframes, and each subframe may include several data tracks. The bit allocation scheme encodes the parameters as a sequence of bits and hence encodes pulse positions and/or signs as a sequence of bits. The whole of the information represented in the various pulses of the various tracks defines an excitation vector used to encode the data in a set number of digital bits. A receiving decoder uses the encoded bit stream to generate excitation vectors to decode the compressed signal.
Some standards may provide for an uneven distribution of pulses among the tracks, such that one or more tracks may include an extra pulse(s). Thus, the encoder should include an indication of which track(s) include extra pulse(s) in the frame sent to the decoder. This indicator is often referred to as indicating the starting track, meaning the first track with extra pulses. The track indicator may be a set of bits sent with each subframe indicating which track in the subframe is the starting track.
One compression standard is the Algebraic-Code-Excited Linear-Prediction (ACELP), with a derivative being Conjugate-Structure ACELP (CSACELP). The ITU-T (International Telecommunication Union Telecommunication Standardization Sector) has defined the G.729 digital transmission system standard based on CSACELP. As with other speech coding standards, ITU-T G.729 specifies a coding bit rate. G.729 is defined to operate at 8.0 kbit/s for compression of normal speech signals, with extension G.729E defined to operate at 11.8 kbit/s for compression of a wider range of signals, including speech with noise, music, etc. G.729E defines 2 subframes, each with 5 tracks, each track containing 8 pulse positions, for a total of 40 pulse positions in which to define an excitation vector in the frame. The pulse positions are interleaved in such a way that track T0 has positions (0, 5, 10, . . . ), T1 has positions (1, 6, 11, . . . ), and so forth.
G.729E specifies the use of 12 pulses, meaning there will be an uneven distribution of pulses across the five tracks. Accordingly, two of the five tracks have three pulses, and the other three have two pulses. One of the two tracks with an extra pulse is identified as a starting track, which is indicated to the decoder for proper decoding alignment. A starting track indicator is sent at the beginning of each subframe, using 3 bits each to indicate one of the five tracks. But note that because three bits are used, there will be three unused combinations. This means that in a frame with two subframes, there will be three unused combinations in each of the two starting track indicators.