The following acronyms may be used throughout this description. They are listed in TABLE 1 below for ease of reference.
TABLE 1ACRONYMDefinitionACSActive Codec SetAFSAMR Full rate Speech serviceAHSAMR Half rate Speech serviceAMRAdaptive Multi Rate speech serviceASICApplication Specific Integrated CircuitBERBit Error RateBSSBase Station SubsystemBTSBase Transceiver StationCDMACode Division Multiple AccessCHDChannel DecoderCHEChannel EncoderC/ICarrier-to-Interference ratio (used to measure linkquality)CMICodec Mode Indication (speech rate used on attachedlink)CMCCodec Mode Command (speech rate commanded to beused by an MS on its uplink)CMRCodec Mode Request (speech rate requested by an MSto be used on its receiving link)CRCCyclic Redundancy CheckdBdecibelsDLDownlinkDSPDigital Signal ProcessorDTXDiscontinuous TransmissionEFREnhanced Full Rate speech codec for GSMEVRCEnhanced Variable-Rate Codec, used in IS-95 CDMAFECForward Error CorrectionFACCHFast Associated Control ChannelFERFrame Erasure RateFPGAField Programmable Gate ArrayFRFull Rate speech codec for GSMGSMGlobal System for Mobile communications, commondigital cellular standardHRHalf Rate speech codec for GSMKBPSKilo Bits Per SecondMSMobile Station, e.g. a cellular phoneRATSCCHRobust AMR Traffic Synchronized Control ChannelRBERResidual Bit Error RateRFRadio FrequencyRXQUALReceived Signal QualitySIDSilence DescriptorSID_UPDATEAMR Frame Used to Convey Comfort NoiseCharacteristics During DTXSNRSignal to Noise RatioSPDSpeech DecoderSPESpeech EncoderTDMATime Division Multiple AccessTRAUTranscoding and Rate Adapting UnitULUplink
Currently, the primary usage of digital cellular systems is for the transmission of voice. The limited available spectrum (bandwidth) of such systems requires that speech be encoded using a minimal number of bits in order to reduce the redundancy of the source data. Potentially poor channel conditions typical in cellular systems, e.g. low SNR and fading, necessitate the use of a channel coding scheme to add redundancy back in an efficient manner. Typically, the channel coding consists of a forward error correction scheme (block or convolutional code) and an error detection scheme, e.g. CRC.
Within the context of the GSM digital cellular standard, several speech codecs are standardized and in use. The original GSM speech codec is commonly referred to as the Full-Rate (FR) speech codec and encodes speech at a rate of 13 kbps. The next generation of codecs took divergent paths. The Half-Rate (HR) codec allowed for a doubling of system capacity but at the expense of voice quality. The Enhanced Full-Rate (EFR) kept the speech rate approximately the same (12.2 kbps), but improved algorithms and increased DSP processing power provided significantly higher voice quality. This codec has been well received and is currently used in most GSM systems. All of these voice services use convolutional codes for error correction and some form of CRC for error detection.
In 1997, the process of standardizing a new GSM speech service was begun in order to take advantage of speech coding advances. A set of requirements was established that included both quality and capacity increases over previous GSM codecs. The improved quality requirements primarily related to operation during poor channel conditions. A new voice service was defined that contained multiple speech coding rates and could adapt the level of channel coding to the channel conditions. This new service became known as the Adaptive Multi-Rate (AMR) speech service for GSM.
To meet both the capacity and quality goals of the AMR service, it was defined with half and full-rate modes of operation. In the full-rate mode, there are 8 speech codec rates defined. Each includes an associated channel coding scheme. For the half-rate mode, there are 6 speech codec rates defined, each having a unique channel coding scheme. Hence, there are a total of 14 channel codes defined for AMR voice and 8 speech rates. The 6 AHS speech rates are a subset of the 8 AFS rates.
Not all of the codec modes may be used within a given call. Specifically, at call setup AMR configurations are downloaded to the MS and BTS. The AMR configuration includes an Active Codec Set (ACS) together with thresholds and hysteresis values. The ACS may contain anywhere from 1 to 4 codecs. The thresholds and hysteresis values are used by an AMR receiver to determine the optimal receive link codec mode from those within the ACS.
The advantage of AMR stems from its ability to dynamically adapt channel coding to meet the current needs of the link wherein the link may include degradations due to low signal fading, shadowing, noise, etc. This link adaptation is assisted by measurements within the AMR receiver of both the BTS and MS. The general operation of AMR link adaptation is shown in the block diagram of FIG. 1.
With respect to the MS, its receiver is required to constantly monitor channel quality in order to determine an appropriate downlink codec mode. The channel quality is quantified as a logarithmic (dB) C/I ratio. It is typically measured on a TDMA burst basis or a speech frame basis and then filtered to remove fast-varying random components. The filtered channel quality is compared against the BSS commanded threshold and hysterises to determine the optimal codec mode. The resultant mode is encoded as a Codec Mode Request (CMR) and returned to the BSS in the reverse link Normally, the BSS will grant the request and use the requested mode for encoding the downlink channel to the MS.
A similar procedure is followed within the BTS. Specifically, the BTS receiver monitors the uplink channel quality from the MS and determines an optimal codec mode based on threshold/hysterisis values together with potential constraints from the network control. The resultant mode is transmitted in the downlink to the MS. This mode is termed the Codec Mode Command (CMC) and is similar to the CMR with the notable exception that the CMC commands the MS as to which rate to use on the uplink whereas the CMR requests that the BTS use a rate on the downlink.
Rate adaptation must occur in a relatively fast fashion in order to be effective and, hence, is signaled using inband data encoded within each AMR traffic frame. Every frame includes inband data but it alternates in meaning between describing its host link and commanding/requesting a mode for the opposite link. When representing its host link, this data is termed the Codec Mode Indication (CMI) and it indicates how that link was encoded. A given CMI value is associated both with the frame in which it was encoded and the succeeding frame. When representing the opposite link, this data provides the CMC (transmitted in the forward link) or the CMR (transmitted in the reverse link). Regardless of its meaning, the inband data always represents two source bits (0 to 3) and can be thought of as an index into the ACS.
With respect to channel coding, the allocation of bits between speech, inband data, FEC, and CRC error protection bits is summarized in the diagram of FIG. 2 for both AFS and AHS frames. For each AFS frame, 8 bits are allocated for encoding the 2-bit inband data. This coding is effectively a ¼ rate block code. For AHS frames, 4 bits are allocated for the encoded inband data effecting a Liz rate block code. The speech bits are subjectively ordered and broken into three classes according to their importance. Class 1a (most important) bits have a 6 bit CRC calculated and appended to them The class 1a bits, class 1a CRC bits, and class 1b bits are encoded using a systematic, punctured, recursive convolutional code. Any remaining speech bits are classified as class 2 and receive no channel coding. There are no class 2 bits for AFS frames as all speech bits are protected. The channel coded AMR frames are block diagonally interleaved and mapped onto bursts in the same manner as existing (HR, FR, EFR) GSM speech frames.
Given the aforementioned coding schemes, it remains to be determined how a receiver turns RF information into bits appropriate for the speech decoder and, ultimately, pleasing audio for the listener, e.g. MS user.
The GSM standard allows considerable flexibility regarding receiver design. The transmit side, particularly channel encoding scheme and related, is precisely specified while the receive side is restricted only by performance limits regarding sensitivity and the like. MS and BTS manufacturers are thus allowed alternative designs according to their appropriateness within a given architecture. For example, poor RF receiver performance may be compensated by a good baseband receiver (channel decoding) and vice versa. It is to be understood that the receiver described herein is typical and that the novel aspects of the invention are applicable to alternate receiver designs.
FIG. 3 provides a block diagram of a typical AMR baseband receiver. RF samples, e.g. I/Q, are collected for bursts of data and passed to an equalizer/demodulation block 302. The equalizer block typically outputs soft bits corresponding to the demodulated data. These bursts are accumulated into blocks of data corresponding to speech frames where 4 bursts comprise an AFS block and 2 bursts comprise an AHS frame. The data blocks are de-interleaved 304 and passed to the AMR channel decoder 306 for processing as well as a frame classification block 308. Provided the data block represented speech (or comfort noise during DTX periods), the resultant speech frame output is input to the speech decoder which converts the data into PCM samples appropriate for converting into audio.
Some other data paths are also possible out of the channel decoder. Specifically, the frame classification procedure analyzes each frame out of the deinterleaver to determine its type, e.g. speech 310, FACCH 312, RATSCCH 314, SID_UPDATE, etc. The resultant classification determines how the channel decoder should be run, i.e. which channel coding scheme should be decoded.
The block diagram of FIG. 4 describes the dataflow of the AMR channel decoder for speech frames in more detail. Received blocks of data first have the encoded inband bit field extracted. This data is block decoded 402 to determine the 2-bit source data. The decoding is accomplished by finding the codeword that is closest to the received sequence, i.e., the code word that is closest in a squared distance sense to the received sequence. This is typically done using soft received bits. The 2-bit source data indicated by the codeword is output 404 from the inband decoder.
For frames corresponding to the CMR/CMC phase, the inband bits are passed out of the channel decoder 406 for use on the opposite link. For the remaining CMI-phase frames, the inband bits are used to determine how the associated current (and next) frame should be channel decoded 408.
The source inband data is a 2-bit index into the ACS with a maximum value of ACS_size−1. The 2-bit index corresponding to the CMI is mapped to an absolute AMR mode from the entire codec set, i.e. a 3-bit value from 0 to 7 for AFS and 0 to 5 for AHS. This absolute form of CMI is used to determine which channel decoding to perform.
The next steps involve channel decoding the frame according to the absolute CMI inband data of the current or previous frame. First, the encoded data, stripped of the inband data portion, is convolutionally decoded 410 to remove channel-induced bit errors. This is typically done using a recursive Viterbi (maximum likelihood) decoder operating on soft-bit data. The resultant (hard-bit) output data includes a 6 bit CRC field. The next step involves checking the CRC against the original source data 412 to ensure the CRC is correct and removing the CRC bits from the bitstream.
The outputs of the channel decoder are a speech frame, a bad frame indication 414 derived from the CRC status (and possibly other inputs), and the codec mode that also indicates the speech rate to decode. The standard also allows for the classification of a frame as a “degraded frame” if the CRC passes but other parameters indicate that the frame is unreliable.
This method is also described by the flow chart of FIG. 5 that applies only to CMI-phase frames. Received blocks of data first have the encoded inband bit field extracted and block decoded 502 to determine the 2-bit source data. The source inband data is a 2-bit index into the ACS with a maximum value of ACS_size−1. The 2-bit index corresponding to the CMI is mapped 504 to an absolute AMR mode from the entire codec set, i.e. a 3-bit value from 0 to 7 for AFS and 0 to 5 for AHS. Next, the encoded data, stripped of the inband data portion, is convolutionally decoded 506 to remove channel-induced bit errors. The next step involves checking the CRC against the original source data 508 to ensure the CRC is correct and removing the CRC bits from the bitstream. This is followed by a bad frame metric calculation 510. Next, a check is made to determine if the frame is good 512. If it is, the speech frame is passed to the speech decoder 514. Otherwise, the speech frame undergoes a second check to determine how bad the speech frame is 516. If it is a degraded frame it is marked as such and the decoded bits are passed to the speech decoder 518. If it is more than degraded, then the speech frame is masked with respect to the speech decoder 520.
An important factor in user-perceived audio quality during marginal channel conditions is receiver (RF and baseband) sensitivity. This is quantified using a variety of measures. The measures of interest in this context are the Frame Erasure Rate (FER) and the Residual Bit Error Rate (RBER).
The FER refers to the rate at which frames are “erased” due to CRC failures or excessive bit errors. Such frames are not recoverable and typically require bad frame masking within the speech decoder, e.g. repetition of a previous frame or muting/comfort noise generation. RBER refers to the bit error rate which is present in the received bitstream when those frames which are erased are excluded from the statistics.
A good-performing inband decoder is necessary to achieve both low FER and RBER. For purposes of explanation, consider a marginal channel in which the inband data is decoded incorrectly. For any such frame, the wrong channel decoder will be run leading to a frame erasure or (on rare occasions) very high RBER.
Inband bit decoding problems are more pronounced due to the fact that the channel codes are relatively strong. For example, in AFS service the lowest speech codec mode (4.75 kbps) is coded using a ⅕ rate recursive systematic convolutional code which is punctured to an effective rate of 101/442. The corresponding inband data is coded using a simple ¼ rate block code. Likewise, the lowest AHS speech service is coded using a recursive systematic punctured rate ⅓ convolutional code whereas the inband data is coded using a simple rate ½ block code. For these and other low rates, the channel codes for the payload data have more error-correcting capability than those of the inband data bits. In other words, in marginal channel conditions the channel decoder (Viterbi and CRC check) may be capable of salvaging many of the frames (correcting the errors/minimizing the BER & FER) provided the inband decoding commands that the correct channel decoder run. However, the inband decoder will tend to fail often and the normal channel decoder will not get the chance to salvage bad frames.
There are a couple of issues that further compound the aforementioned inband decode problem. First, the relatively weak inband coding and strong channel coding occur at the lower operational modes, e.g. 4.75 kbps speech. It is at these rates that the problems are most likely to occur. Such lower rates are used when the channel is quite poor and the combination inband/channel decode needs to perform its best Second, due to the fact that a given frame's CMI controls the current and next frame, a bad decode will erase 2 frames rather than just one.
Performance degradation due to bad inband decodes can be reduced by using a priori knowledge when performing the decode, e.g. using Markov modeling with statistical information. In qualitative terms, the CMI inband data does not often change. It is derived from a channel quality measure which is typically heavily filtered and associated with threshold values. Provided adequate hysterisis values are used, the filtering effectively prevents many mode changes from occurring, e.g. mode changes would typically occur with a mean-time between changes on the order of seconds. Hence, for a given CMI-phase frame, the inband data is most likely to stay the same as that previously decoded. By biasing the inband decoder to stay in the same state, its performance can be significantly increased.