1. Field
One or more embodiments relate to technologies and techniques for encoding and decoding audio, and more particularly, to technologies and techniques for encoding and decoding audio with improved frame error concealment using a multi-rate speech and audio codec.
2. Description of the Related Art
In the technical field of speech and audio coding for environments where frames of encoded speech or audio are expected to be subjected to occasional losses during their transport, coded speech and audio transporting or decoding systems are designed to limit frame losses to the order of a few percent.
To limit these frame losses, or to compensate for the loss of frames, frame erasure concealment (FEC) algorithms may be implemented by a decoding system independent of the speech codec used to encode or decode the speech or audio. Many codecs use decoder-only algorithms to reduce the degradation caused by frame loss.
Such FEC algorithms have recently been utilized in cellular communication networks or environments, which operate in accordance with a given standard or specification. For example, the standard or specification may define the communication protocols and/or parameters that shall be used for a connection and communication. Examples of the different standards and/or specifications include Global System for Mobile Communications (GSM), GSM/Enhanced Data rates for GSM Evolution (EDGE), American Mobile Phone System (AMPS), Wideband Code Division Multiple Access (WCDMA) or 3rd generation (3G) Universal Mobile Telecommunications System (UMTS), International Mobile Telecommunications 2000 (IMT 2000), for example. Here, speech coding has previously been performed with either variable rate or fixed rate encoding. In variable rate encoding, the source uses an algorithm to classify speech into different rates, and encodes the classified speech according to respective predetermined bit rates. Alternatively, speech coding has been performed using fixed bit rates, where detected voice speech audio may be coded according to a fixed bit rate. An example of such fixed rate codecs include multi-rate speech codecs developed by the 3rd Generation Partnership Project (3GPP) for GSM/EDGE and WCDMA communication networks, such as the adaptive multi-rate (AMR) codec and the adaptive multi-rate wideband (AMR-WB) codec, which code the speech according to such detected voice information, and further based upon factors such as the network capacity and radio channel conditions of the air interface. The term multi-rate refers to fixed rates being available depending on the mode of operation of the codec. For example, AMR contains eight available bit-rates from 4.7 kbit/s to 12.2 kbit/s for speech, while AMR-WB contains nine bit-rates from 6.6 kbit/s to 23.85 kbit/s for speech. The specifications of the AMR and AMR-WB codecs are respectively available in the 3GPP TS 26.090 and 3GPP TS 26.190 technical specifications for the third generation of the 3GPP wireless systems, and voice detection aspect of the AMR-WB can be found in the 3GPP TS 26.194 technical specification for the third generation of the 3rd 3GPP wireless systems, the disclosures of which are incorporated herein.
In such cellular environments, for example, losses may be due to interference in a cellular radio link or router overflow in an IP network, for example. Currently, a new fourth generation of the 3GPP wireless system is currently being developed, known as Enhanced Packet Services (EPS), with a primary air interface for EPS being referred to as Long Term Evolution (LTE). As an example, FIG. 1 illustrates EPS 10, with a speech media component 12, wherein voice data is coded according to an example AMR-WB codec for wideband speech audio data and the AMR codec for narrowband speech audio data, this AMR may also be referred to as AMR Narrowband (AMR-NB). EPS 10 conforms to UMTS and LTE voice codecs in 3GPP Release 8 and 9, for example. The UMTS with LTE voice codecs in the 3GPP Releases 8 and 9 may also be referred to as Multimedia Telephony Service for IP Multimedia Core Network Subsystem (IMS) over EPS in the 3GPP Releases 8 and 9, which are the first releases for the fourth generation of the 3rd 3GPP wireless systems. IMS is an architectural framework for delivering Internet Protocol (IP) multimedia services.
Even though LTE has been developed in view of the potential transmission interference and failing in cellular or wireless networks, speech frames transported in 3GPP cellular networks will still be subject to erasure, with a small percentage of frames and/or packets being lost during transmission. Erasure is a classification, e.g., by a decoder, for the decoder to assume information of that packet has been lost or unusable. In the case of the EPS network, for example, frame erasures may still be expected. To address the erased frames, the decoder will typically implement frame error concealment (FEC) algorithms to mitigate the impact of the corresponding lost frames.
Some FEC approaches use only the decoder to address the concealment of the erased frame, i.e., the lost frame. For example, the decoder is aware or is made aware that a frame erasure has occurred, and estimates the contents of the erased frame from known good frames that arrive at the decoder just before and sometimes also just after the erased frame.
A feature of some 3GPP cellular networks is the ability to identify and notify the receiving station of frame erasures that take place. Therefore, the speech decoder knows whether a received speech frame is to be considered a good frame or considered an erased frame. Due to the nature of speech and audio, a small percentage of frame erasures can be tolerated if proper frame erasure mitigation or concealment measures are put in place. Some FEC algorithms may merely substitute noise in place of the lost packet, silence, some type of fading out/in, or some type of interpolation, for example, to help make the loss of the frame less noticeable.
Alternate FEC approaches include having the encoder send specific information in a redundant fashion. For example, the ITU Telecommunication Standardization Sector G.718 (ITU-T G.718) standard, incorporated herein by reference, recommends sending redundant information pertaining to a core encoder output, in an enhancement layer. This enhancement layer could be sent in a different packet from the core layer.