Push-to-talk over Cellular (PoC) provides a type of half-duplex or one-way communications service between two or more users. Users often form a group and then communicate with each other in a “point-to-multipoint” fashion. The communications are one-way: while one person (user) speaks, the others listen. A “turn” to speak is generally granted on a first come, first serve basis in response to a user pressing a push to talk button on the user's wireless terminal/user equipment. PoC functionality is typically delivered across operator networks using Voice over IP (VoIP) protocols, although other technology implementations are possible.
Push-to-talk over Cellular can be viewed as an IP Multimedia Subsystem (IMS) based “voice chat” service for cellular telecommunication systems. As shown in FIG. 1, a sending PoC client terminal sends packet data traffic to a PoC server, and in the case of a group call, the PoC server duplicates the traffic to all recipients in the group. As an IMS service, PoC utilizes the Session Initiation Protocol (SIP) to set-up a voice communication between two or more PoC clients. FIG. 2 illustrates an example communications protocol stack for PoC. The PoC application operates on an IP related stack that includes SIP, and Real-time Transport Protocol (RTP) is used to handle the voice packet delivery on the user plane. The SIP and RTP protocols employ the underlying User Data Protocol (UDP) and IP protocols that operate themselves on top of link layer (L2) and physical layer (L1) protocols used in the cellular radio access network.
FIG. 3 shows one example of mapping voice packets to a IP/UDP frame. The voice is divided into 20 msec speech encoded frames. The example speech encoding technique shown is adaptive multi-rate (AMR). AMR is a variable rate speech codec selected by the 3GPPP for 3G WCDMA cellular communications. Using the Algebraic Code Excited Linear Predictive (ACELP) compression technology, AMR provides toll quality sound at transmission rates from 4.75 to 12.2 Kbps. Multiple AMR frames are used to fill the AMR payload of the IP/UDP packet.
The SIP protocol carries the Session Description Protocol (SDP) that is used to exchange session details between two PoC clients such as the type of media, codec, or sampling rate, etc. This SDP information is carried by the SIP message in a way that is analogous to a document attachment being carried by an email message, or a web page being carried in an HTTP message. One SDP media capability that is exchanged during the voice communication session set-up is the speech codec supported by the PoC client and the media transport port(s) to be used for that speech codec.
PoC is currently being standardized and agreed upon in the Open Mobile Alliance (OMA) forum. Ideally, the PoC services specified should be access technology independent. But in practice, this probably will not happen—particularly with respect to speech codec technology. OMA has proposed that different access technology organizations, e.g., 3GPP2 for CDMA2000 (IMT-2000) access technology and 3GPP for WCDMA access technology, each chose an appropriate speech codec for its respective organization that best suits its associated access technology. In this example, it may be that 3GPP will choose an AMR codec for PoC, and 3GPP2 will chose an Enhanced Variable Rate Coder (EVRC) as the speech codec for PoC. An EVRC codec is a Relaxation Code Excited Linear Prediction (RCELP) based codec and uses three rates: full rate at 8.5 kbps, half rate at 4 kbps, and eight rate at 800 bps and works with a 20 msec speech frame.
A problem with users having different codecs is that they do not operate together. But interoperability is essential regardless of the user's access network. Interoperability is required both in the user equipment and in the network server supporting the service. In the PoC context, this means speech codec interoperability must be provided by PoC clients and PoC servers.
One approach to provide interoperability is for the network infrastructure to support transcoding. In a PoC example, a transcoder located in the PoC server would translate between different speech coding and decoding techniques. But the drawbacks with transcoding are substantial. First, transcoding between two low rate codec modes significantly reduces speech quality. Second, transcoding operations between thousands of PoC clients would require powerful and expensive data processing resources in the PoC server. Third, transcoding would likely increase end-to-end delay between the PoC clients reducing the quality of the PoC service. Fourth, there is no standardized transcoder currently available. Another approach might be employ multiple codecs in each PoC client and PoC server to ensure a common codec. But here the cost is likely prohibitive—at least in a commercial context.
Another interoperability problem is how to handle the use of different radio access bearers/transport formats. Even though two PoC clients may use the same “native” speech codec, those clients may use different radio access bearers for the PoC service. Consider an example where a 3GPP2 client terminal uses a “conversational class” bearer optimized for VoIP which produces a media stream with one EVRC full rate frame per IP packet. The other terminal may also be a 3GPP2 client terminal, but it uses a general purpose “interactive class” packet switched bearer, and thus would prefer media streams, (e.g., four EVRC frames per IP packet), to avoid unacceptable end-to-end media delay.
The invention overcomes these problems and achieves interoperability between wireless user devices having different speech processing capabilities and/or different transport bearer formats tailored to a particular speech encoding format. A first wireless user communication device includes a primary speech codec that encodes a first speech message using a first speech encoding format. The encoded speech is then sent to a second wireless user communications device that includes a primary speech codec supporting a second speech encoding format. The first user device receives from the second user device a second speech message encoded using the second speech encoding format. The second speech message is then decoded by the first user device using a second speech decoder supporting decoding of the second speech encoding format. But the first communication device does not support speech encoding using the second speech encoding format—regardless of whether the first communication device includes or does not includes an encoder for encoding speech using the first speech encoding format.
The first speech message is transported using a first type of transport bearer that uses a first packetizing of speech encoded frames. The second speech message is transported using a second type of transport bearer that uses a second packetizing of speech encoded frames. The communication system includes a service support server for supporting the communication between the first and second devices. The server re-packetizes at least a portion of the first speech message before the first speech message is sent to the second wireless communication device.
The first user device sends the service support server a signaling message that includes one or more attributes indicating that the first user device supports speech encoding and decoding using the first speech encoding format and decoding of speech encoded using the second speech encoding format, but does not support encoding speech using the second speech encoding format. The signaling message preferably also includes one or more attributes indicating that the first wireless user communication device supports a first transport bearer format for speech encoded using the first speech encoding format and a second transport bearer format for speech encoded using the second speech encoding format.
In one non-limiting, example application, the first and second wireless user communications devices are Push-to-talk (PTT) type communications devices. One example of a PTT communication is a PTT over Cellular (PoC) communication.