The invention relates to error control in digital communication of speech data as in radio telephones. Bit errors occur in received speech frames due to communication channel disturbances. Error correction decoding of received speech frames is used to remove the bit errors prior to speech decoding of the speech frame. A speech frame that still contains residual bit errors after error correction decoding is termed a bad frame. If the bad frame is passed through the speech decoder unacceptable speech quality may occur due to distortions caused by the residual bit errors in the speech frame. To mitigate this problem received bad speech frames are identified as such prior to speech decoding by a binary bad frame indicator wherein a value of 1 signifies a bad frame. The speech decoder does not decode a bad frame but instead outputs previously decoded speech data or other substitute which is known to provide a more pleasant audio output than a decoded bad frame. The rate at which bad frames are indicated (BFI=1) is termed the frame erasure rate (FER). The GSM standard specifies stringent performance requirements for FER. The GSM requirement for BFI performance states that the number of undetected bad frames in a specified interval of time must be no greater than a specified number. Simultaneously, the GSM requirement states that the FER must be less than a specified number. In order to achieve an optimum compromise between these two opposing GSM requirements a bad frame criterion is needed.
In a conventional system, a cyclic redundancy code (CRC) is typically used to determine a bad frame. However, under the GSM standard, only a 3 bit CRC is provided which is a very weak code for identifying bad frames. A BFI indication based soley on the CRC cannot meet the GSM requirements for BFI and FER performance. In fact, the CRC may indicate no frame errors even though nearly half the frame bits are in error. This failure event can occur with non-negligible probability so that the CRC criterion alone does not suffice as a satisfactory BFI indicator.
Accordingly, the present invention utilizes four-signal quality metrics to specify the bad frame criterion. In addition to the frame CRC, the estimated signal to noise ratio (ESNR) for each of the 8 bursts comprising the frame, estimated bit error count (EBEC) for the frame, and stealing flag values for the frame are used. By using the four metrics jointly in a bad frame criterion, the present invention substantially improves decoded speech quality and provides an optimum compromise between the opposing GSM performance requirements for low false bad frame indication and maximum frame error rate. In addition to the frame cyclic redundancy check (CRC), the invention uses three novel signal quality metrics to detect the presence of speech frame errors. The first aspect of the invention for detecting frame errors is a method that uses estimated signal-to-noise ratio ESNR for each of the bursts comprising a speech frame to measure signal quality, e.g., four burst for the GSM system. Also, the present invention defines ESNR in a manner different from conventional practice in the art. The present invention defines ESNR as estimated signal power divided by estimated noise power. This ESNR can be obtained from the training sequence in a received TDMA burst. Under the GSM standard, the training sequence in a burst has 26 bits for synchronization. For the signal power measurement, the 26 received training sequence bits are correlated with a 16-bit local bit sequence selected from the 26 received training sequence. The optimal 16-bit local bit sequence is the central 16 bits of the whole 26 bit training sequence.
Then, 11 correlation values are obtained by a xe2x80x9csliding 16 bitxe2x80x9d correlation of the 16 bits of the local sequence against 11 consecutive 16 bit sequences obtained by selecting 11 subsets of 16 consecutive bits contained in the 26 bits of the training sequence. Each 16-bit subset is offset from its nearest consecutive neighbor by one bit. The first correlation starts from the first bit position of the 26 bit training sequence. And the last ends with the last bit of the local sequence aligned with the last bit of the training sequence.
An integer number, L, is defined for embodiments of the present invention. For the GSM system L is defined as equal to 6. L absolute magnitude summations are formed from the L available L-bit consecutive sequences of the 11 correlation values obtained above. The maximum value of the L absolute magnitudes summations obtained is used as the estimated signal power. The choice of 11 correlation values is made for GSM system applications and the integer L being equal to 6 is selected to match the multi-path model for GSM systems, as is widely known.
It is also known that the L consecutive correlation values whose summation is the maximum, corresponding to the estimated signal power constitute the estimated channel impulse response. Next, the estimated received training sequence is obtained by convolving the local training sequence and the estimated channel impulse response. Convolution may be done in hardware or software, but is preferably performed in a commercially available DSP processor having sufficient processing power for the intended application, e.g., a chip or packaged device.
One novel feature of the present invention is the computation of an estimate for signal to noise ratio (SNR). Estimated noise power for the received channel is obtained by comparing the difference between the received training sequence and estimated received training sequence. Estimated signal-to-noise ratio (ESNR) for the present invention is computed by the ratio of Estimated noise power for the received channel and estimated received training sequence power. If any M ESNRs are less than a given threshold (SNRt) within the first four bursts or the last four bursts, then BFI=1 and otherwise BFI=0.
The second aspect of the invention for detecting frame errors is a method that uses stealing flags, normally generated by the GSM standard in combination with an estimated frame bit error count (EBEC). A summation is formed for the absolute value of 8 consecutive stealing flags detected in the received signal. When the absolute value of the summation of the 8 stealing flag values is less than a given threshold value, SFv, and EBEC is greater than a second threshold value, EBv, then the speech frame is declared to be bad frame, i.e., BFI=1. Taking the two aspects of frame error detection jointly the invention uses the following form of criteria for bad frame indication:
BFI=1 if any of the following conditions are true:
(1) CRC decoding indicates an error (parity failure);
(2) Three burst ESNRs are less than a threshold EBv among the first four bursts or among the last four bursts of a speech frame; or
(3) The absolute value of the sum of 8 stealing flags is within a value range [xe2x88x92y, +y] for some threshold value y, and the frame EBEC is greater than threshold EBv. Otherwise BFI=0.