AMR-WB (Adaptive Multi-Rate-Wideband) is a speech codec with a sampling rate of 16 kHz that is described in ETSI TS 126 190 V.8.0.0 (2009-01) hereby incorporated by reference in its entirety. AMR-WB has nine speech coding rates. In kilobits per second, they are 23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85, and 6.60. The bands 50 Hz-6.4 kHz and 6.4 kHz-7 kHz are coded separately. The 50 Hz-6.4 kHz band is encoded using ACELP (Algebraic Codebook Excited Linear Prediction), which is the technology used in the AMR, EFR, and G.729 speech codecs among others.
CELP (Codebook Excited Linear Prediction) codecs model speech as the output of an excitation input to a digital filter, where the digital filter is representative of the human vocal tract and the excitation is representative of the vibration of vocal chords for voiced sounds or air being forced through the vocal tract for unvoiced sounds. The speech is encoded as the parameters of the filter and the excitation.
The filter parameters are computed on a frame basis and interpolated on a subframe basis. The excitation is usually computed on a subframe basis and consists of an adaptive codebook excitation added to a fixed codebook excitation. The purpose of the adaptive codebook is to efficiently code the redundancy due to the pitch in the case of voiced sounds. The purpose of the fixed codebook is to code what is left in the excitation after the pitch redundancy is removed.
AMR-WB operates on frames of 20 msec. The input to AMR-WB is downsampled to 12.8 kHz to encode the band 50 Hz-6.4 kHz. There are four subframes of 5 msecs each. At a 12.8 kHz sampling rate, this means that the subframe size is 64 samples. The four subframes are used to choose the linear prediction filter and identify the excitement using known techniques. To produce 64 samples at the output of the linear prediction filter thus determined, an excitation with 64 pulse positions is needed.
With ACELP, the fixed codebook component of the excitation is implemented using an “algebraic codebook” approach. An algebraic codebook approach involves choosing the locations for signed pulses of equal amplitude as the subframe excitation.
In the case of AMR-WB, the 64 position component of the excitation is divided into 4 interleaved tracks of 16 positions each. Each of the 16 positions can have a signed pulse or not. Encoding all 16 bit positions for each track as a signed pulse or not will result in the least amount of distortion. However, for bandwidth efficiency purposes, rather than encoding all 16 pulse positions, only the positions of some maximum number of pulses are encoded. The higher the maximum number, the lower the distortion. With AMR-WB, the number of positions that are encoded varies with bit rate.
The 23.05 kbps and 23.85 kbps modes both use 6 pulses per track. The AMR-WB speech codec defined in ETSI TS 126 190 V.8.0.0 (2009-01) encodes the algebraic codebook index for one subframe with 88 bits. The pulses are encoded with 22 bits per track.
The 19.85 kbps mode uses 5 pulses in 2 of the 4 tracks and 4 pulses in the other 2. The AMR-WB speech codec defined in ETSI TS 126 190 V.8.0.0 (2009-01) encodes the algebraic codebook index for one subframe with 72 bits.
The 18.25 kbps mode uses 4 pulses in each of the 4 tracks. The AMR-WB speech codec defined in ETSI TS 126 190 V.8.0.0 (2009-01) encodes the algebraic codebook index for one subframe with 64 bits.