3rd Generation Partnership Project 3GPP specifies Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) as mandatory speech codecs for voice services in 3G networks. These codecs are also mandatory for 3GPP Voice over IP (VoIP) service that is specified within the 3GPP multimedia telephony via IP Multimedia Subsystem (IMS). The ruling specification for the media handling and interaction is 3GPP TS 26.114. Despite the mandatory status of these codecs there are presently activities in 3GPP to specify a new voice codec that will enable even higher service quality than what is possible with AMR-WB, the Enhanced Voice Service (EVS) codec.
However, introducing a new speech codec into a speech communications system may be problematic in some respects. One problem is that there is always an installed base of legacy equipment (both terminals and network infrastructure) that does only support the existing 3GPP codecs or just one of them, for instance AMR-WB, rather than the new codec. This may lead to interoperability problems in which communication between new and legacy equipment is not possible unless proper mechanisms are implemented in the system. Traditional ways to address this problem is the provisioning of transcoders in e.g. media gateways that translate between the new and the old coding formats, or the provisioning of the legacy codecs besides the new codec in new terminals that allows choosing the legacy coding format when a connection to a legacy terminal is established. This latter method requires that there is a capability exchange between the terminals prior to the actual speech connection that identifies the common codec that both terminals support. Within the IMS the session description protocol (SDP) IETF RFC 4566 is used to carry out this capability exchange.
The above described ways for ensuring interoperability when introducing a new codec into a communication system are though not the only possibilities and have various disadvantages. The provisioning of transcoders means additional equipment that raises the network investment and maintenance costs. Transcoding is also associated with undesirable speech quality degradations. Using the capability exchange between the terminals prior to the call is a very elegant way, which however may not always be possible. Examples where this is not always possible are multi-party conferencing, hand-over scenarios with mobile users roaming to cells without Multimedia Telephony Service for IMS (MTSI) support, voice messaging. Also from terminal implementation point of view it may be undesirable to provide support for the complete set of new and legacy codecs as this may increase implementation and technology licensing costs.
Consequently, there is a need for enabling introducing new speech codecs into telecommunication systems to provide an improved quality of service, in particular to 3GPP systems, whilst maintaining backwards compatibility with old or legacy codecs.
A third possibility hence chosen by 3GPP for the EVS codec to interoperate with legacy AMR-WB equipment is the inclusion of AMR-WB interoperable coding modes as one part of the EVS codec besides completely new operation modes. This approach alleviates all above discussed problems. However, 3GPP does not specify solutions about how to signal from a sending side UE to a receiving side UE which of the available EVS modes, AMR-WB interoperable or non-interoperable has been used for coding and at what bit rate.
One possible solution of this signaling problem is disclosed in US20120035918: “Method and arrangement for providing a backwards compatible payload format”. This solution relates to methods of introducing new speech codecs into legacy systems. In particular, this solution discloses a backwards compatible payload format which allows inclusion of a new speech codec. In a concrete application of this solution the AMR-WB interoperable modes of the EVS codec are Real-time Transport Protocol (RTP) packetized like AMR-WB packets according to IETF RFC 4867. A signaling bit is though included in the previously unused bits of the AMR-WB payload format, in order to provide the possibility to signal the possible use of the new non-interoperable EVS codec modes. If the corresponding bit in the RTP payload header is set, this is treated as a signal that the speech/audio payload data bits to follow represent a bit stream associated with the new non-interoperable EVS codec modes rather than the AMR-WB interoperable modes.
The problem with the above described approach of US20120035918 is however that a corresponding RTP payload format for the EVS codec inevitably makes use of the RTP payload header of the included legacy codec (AMR-WB). In applications where transmission resources are extremely limited such an overhead is undesirable.
In order to solve this overhead problem there exist other solutions that do not use an RTP payload header at all (example EVRC (Enhanced Variable Rate Codec) or ITU-T G.729 codec). The necessary signaling information related to the payload is in such cases derived from other information elements of the RTP packets, as e.g. information provided in the IP/UDP/RTP header fields that are different from an RTP payload header. One important information element that can be used is the size of the RTP payload or the size of the packet. If it is clear that each RTP packet always only contains a single frame of coded speech/audio (corresponding to e.g. 20 ms speech/audio), then the bit rate used for coding of the speech/audio signal is easily obtained from the RTP payload size. This is a practical solution in case the codec uses only a limited and discrete set of rates and if the operation modes of the codec are directly connected to the respective bit rates. In case, however, frame aggregation is used, meaning that a plurality of coded speech/audio frames are transmitted within a packet, this solution does not always work. This will be exemplified as follows: Assume up to 2 coded frames can be transmitted in each RTP packet and the codec has two codec modes with rates 8 kbps and 16 kbps. Each frame corresponds to 20 ms. It is now further assumed that the sender operates with frame aggregation and that it places two frames into each packet. In the example it is further assumed that the first frame of the packet is encoded with 8 kbps, meaning that it comprises 20 bytes of data. The second frame is encoded with 16 kbps meaning that the coded speech frame comprises 40 bytes of data. The payload size of the packet containing both aggregated frames is hence 60 bytes. The receiver receives this RTP packet with 60 bytes payload and the task is to figure out in what way the data included in it is encoded. The receiver might now conclude from the reception of this packet and its payload size that it either contains 3 frames of data encoded at 8 kbps or one frame encoded at 16 kbps and one frame encoded at 8 kbps. In the latter case it is yet not clear whether the 8 kbps encoded frame comes first or second. As becomes clear from the example, this ambiguity makes it impossible for the decoder in the receiver to decode the received frames in a proper way. Hence, allowing frame aggregation (or not excluding the possibility of frame aggregation) may introduce ambiguities making header-less RTP payload formats impossible. Frame aggregation is though a very desirable feature for VoIP for certain IP networks with e.g. WLAN access.
Another problem pertains to the possible interoperation of the AMR-WB interoperable modes of the EVS codec with legacy equipment supporting AMR-WB codec only. For the purpose of mode adaptation the AMR-WB RTP payload format provides in its header a 4 bit wide bit field to carry so-called CMRs (codec mode requests). The purpose of the CMRs is to signal to a sending side UE the preferred codec mode it should use in its encoding operation. This allows adapting the used bit rate in response to e.g. transmission channel changes or system capacity limitations, the so-called AMR adaptation using inband signaling. A header-less payload format of the EVS codec for the AMR-WB interoperable modes would not be able to transport these CMRs and hence in interoperation scenarios with legacy AMR-WB equipment codec mode adaptation based on the AMR inband signaling concept using the CMRs would not be possible.