This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
TFO and TrFO in a 3rd Generation Partnership Project (3GPP) core network, as well as the receiver logic in services such as VoIP services, may inject empty frames or packets passed to a speech coder with a transmission code RX_NO_DATA into the adaptive multi-rate wideband (AMR-WB) bit stream. In other words, an active speech bitstream may occasionally contain empty frames or packets. These empty frames or packets are typically used for other purposes. For example, such frames or packets are often replaced with urgent signalling data such as TFO/TrFO signalling or other system-level signalling. In order to avoid having the decoder process such “non-speech” data frames/packets as speech frames/packets, they are labelled as RX_NO_DATA. In another example of reception of a RX_NO_DATA frame, a frame that is lost or corrupted along the transmission path may be replaced with a RX_NO_DATA frame, e.g., by some intermediate entity.
When an AMR-WB decoder receives a RX_NO_DATA frame within a segment of active speech when discontinuous transmission (DTX) operation is enabled, an AMR-WB decoder implementation according to TS 26.173 v7.0.0 (fixed point implementation) and TS 26.204 v7.0.0 (floating-point implementation) may mute or attenuate the output of the speech synthesis, sometimes for a period of up to 100 ms. This muting or attenuation of the output causes issues relating to significant speech quality degradation.
The intended AMR-WB decoder functionality, according to TS 26.193 v7.0.0, “Source controlled rate operation,” notes that NO_DATA frames received when the decoder is in a SPEECH mode should be treated as SPEECH_LOST frames from a DTX handler perspective. In particular, TS 26.193 v7.0.0 states “if the RX DTX handler is in mode SPEECH, then frames classified as SPEECH_DEGRADED, SPEECH_BAD, SPEECH_LOST or NO_DATA shall be substituted and muted as defined in 3GPP TS 26.191. Frames classified as NO_DATA shall be handled like SPEECH_LOST frames without valid speech information.”
It may be desirable for the AMR-WB decoder to be made robust so that it can handle any frame type input combination that may be created by the network or created by implementations in terminals/gateways. However, certain problems arise in the case of DTX synchronization. The AMR-WB encoder has voice activity detection (VAD) functionality that detects inactive speech, and the AMR-WB encoder sets the VAD flag to zero accordingly in order to indicate a frame containing inactive speech. The discontinuous transmission (DTX) functionality is invoked after the DTX hangover period of eight frames, during which the comfort noise parameters are determined. The decoder needs to be synchronized with the encoder with regard to this DTX hangover. If the decoder is not so synchronized, the comfort noise calculation in the decoder will be misaligned with the encoder.
Conventionally, the received NO_DATA frame is simply classified as a frame belonging to a DTX period, i.e. indicating that there was no transmission. However, a problem arises in this situation because, although the transmitter or network was transmitting signaling frames, the DTX synchronization logic is misaligned. The synchronization is restored after the first Silence Descriptor (SID) frame containing the comfort noise parameters is received. On the other hand, when the NO_DATA frame is classified as part of active speech bit stream and is replaced by the SPEECH_LOST frame type (and therefore by an error concealment operation in the decoder) a problem can arise with the DTX handling. For example, if the receiver has lost the SID_FIRST frame (the first frame of a DTX period), then the NO_DATA frame is erroneously classified as a lost speech frame. Again, the synchronization is restored after the next SID_UPDATE has been received.
In a fixed-point AMR-WB reference implementation (3GPP TS 26.173), the handling of this DTX synchronization is implemented in c-code, as shown in Example 1 below (function “rx_dtx_handler” in source file “dtx.c”).
EXAMPLE 1 1 if ((sub(frame_type, RX_SID_FIRST) == 0) ∥ 2(sub(frame_type, RX_SID_UPDATE) == 0) ∥ 3(sub(frame_type, RX_SID_BAD) == 0) ∥ 4(sub(frame_type, RX_NO_DATA) == 0)) 5 { 6encState = DTX;move16( ); 7 } else 8 { 9encState = SPEECH;move 16( );10 }
At lines 1-3 of the above, the algorithm checks to see if the frame is a SID_FIRST frame, a SID_UPDATE frame or a corrupted SID frame. At line 4, the algorithm determines if this frame is a NO_DATA frame. If one or more of these conditions are true, then the decoder switches into (or stays in) the DTX state. Based on this piece of source code, it is clear that if a NO_DATA frame is inserted instead of a speech frame being dropped to make room for signaling data in a middle of a segment of active speech, the decoder will erroneously switch to DTX mode even though the correct action would be to stay in speech state.
One prior suggestion for handling the above situation is depicted in Example 2 below.
EXAMPLE 2 1 if ((sub(frame_type, RX_SID_FIRST) == 0) ∥ 2(sub(frame_type, RX_SID_UPDATE) == 0) ∥ 3(sub(frame_type, RX_SID_BAD) == 0) ∥ 4((sub(frame_type, RX_NO_DATA) == 0) && 4b(sub(st−>dtxGlobalState, SPEECH) != 0))) 5 { 6encState = DTX;move16( ); 7 } else 8 { 9encState = SPEECH;move16( );10 }
Although the text in line 4b above ensures that NO_DATA that might be inserted in the middle of a segment of active speech does not cause erroneous switching into DTX state, this still does not fully solve the problem of incorrect handling of an inserted NO_DATA frame.