1. Field
The present invention relates generally to voice data communication and particularly to providing tandem-free communications between wireless communication systems having different native vocoder types.
2. Description of the Related Art
The field of wireless communication includes many applications such as wireless telephones, paging, wireless local loops, personal digital assistants (PDAs), Internet telephony, and satellite communication systems. Various over-the-air interfaces developed for such wireless communication systems include frequency division multiple access (FDMA), time division multiple access (TDMA), and code division multiple access (CDMA).
In order to maximize voice quality and system capacity, many wireless communication systems have been designed to use digital compression of voice data. Such digital compression is used to compress a digitized speech signal into a low-bit-rate signal that can be accommodated by wireless data channels having limited bandwidth or throughput capacity. This digital compression of voice data is referred to as speech coding or vocoding. Speech coding is used in wireless communication systems designed in accordance with various well-known CDMA wireless standards such as the TIA/EIA IS-95 standard and its progeny, W-CDMA, cdma2000. In addition, speech coding is used in wireless communication systems designed in accordance with TDMA standards such as North American TDMA and GSM.
Many current speech coders operate by extracting parameters relating to a model of human speech generation and then using these parameters to compress the speech for transmissions. A speech coder typically includes an encoder and a decoder to accommodate bi-directional speech communication. The encoder receives as input a continuous stream of digital voice data samples representative of a speech signal. The encoder divides the incoming speech signal into blocks of time, or analysis frames. The encoder then analyzes the analysis frames to extract certain relevant parameters and incorporates the parameter information into digital speech frames. The decoder receives digital speech frames as input and extracts or reproduces speech parameter information from the digital speech frames. The decoder then resynthesizes the analysis frames using the speech parameter information. Speech coders are also referred to as voice coders, or “vocoders,” and the terms will be used interchangeably herein.
Different types of speech coders have been proposed or deployed for use in various wireless communication systems. The different types of speech coders often employ substantially dissimilar speech compression techniques and digital speech frame formats. In general, a digital speech frame generated using a particular speech encoder can only be properly decoded using a decoder of the corresponding type.
In addition to using different types of speech compression, speech coders may also differ based on the type of wireless interface to be used between a wireless terminal and a wireless network. Some wireless networks require continuous transmissions, even when there is no speech activity (the user is not speaking). Other wireless networks permit the wireless terminal to stop transmitting entirely during such periods of speech inactivity. During periods of speech inactivity, speech coders used in continuous transmission (CTX) wireless systems are designed to provide a continuous series of small, or low-rate frames such as eighth-rate frames containing minimal audio information. In contrast, speech coders used in discontinuous transmission (DTX) wireless systems are designed to generate a single frame at the beginning of a period of speech inactivity and then to generate no frames until speech activity resumes. The frame generated at the beginning of a period of speech inactivity is referred to as a silence descriptor (SID) frame. The decoder used in DTX wireless systems use the data within a single SID frame to generate speechless “comfort noise” over multiple frame periods. The CTX and DTX approaches to providing some comfort noise during periods of speech inactivity are generally incompatible. In other words, a DTX decoder cannot decode an eighth-rate CTX frame. Similarly, a CTX decoder cannot decode a SID frame.
A wireless communication system generally supports two types of voice calls based on whether one or both conversing parties are using wireless terminal equipment. In a mobile-to-land call, a first party uses a wireless terminal to converse with a second party who is using landline terminal equipment. In a mobile-to-mobile call, both parties converse using wireless terminals.
For example, a mobile-to-land call is established between a cellular phone user and a person using a landline phone. In such a connection, the voice signal of a person speaking into the microphone of the cellular phone is digitized and encoded before the cellular phone transmits the resultant digital speech frames to a wireless communication network. The wireless communication network decodes the digital speech frames and converts them into an uncompressed digital or analog speech signal to a landline telephone network (also referred to as a “plain old telephone system” or POTS network). The landline, or POTS, network then transports the uncompressed speech signal to the landline phone, and the landline phone amplifies the speech signal into the speaker built-in to the landline terminal. The processing of speech data in the opposite direction (from the person speaking into the microphone of the landline phone to the signal emitted from the speaker built-in to the cellular phone) is substantially the reverse process of the one just described, with the speech encoding occurring in the wireless communication network and the decoding occurring in the cellular phone.
An example of a mobile-to-mobile call is established between two users of cellular phones. In such a connection, each cellular phone digitizes and encodes the speech signal of its respective user and provides the digital speech frames to one or more wireless communication networks. In some networks, the wireless communication network decodes and then re-encodes the digital speech frames received from a first cellular phone before sending the re-encoded speech frames to the second cellular phone. Such decoding and re-encoding of the speech data, also known as tandem vocoding, causes additional delays in the transmission of speech data between the two cellular phone. More importantly, decoding and re-encoding of speech data causes needless degradation of the voice quality of the speech signal eventually emanating from the built-in speaker of the destination cellular phone. In order to avoid such delay and voice quality degradation, many wireless communication networks are designed to deliver a digital speech frame received from a first wireless terminal to a second wireless terminal essentially unchanged. Such “tandem-free” operation avoids the delay and voice quality degradation inherent in tandem vocoding.
The greatest voice quality achievable in a POTS network is limited by various legacy parameters of landline phone systems. For example, each unidirectional voice data stream in a POTS network is carried within a pulse code modulation (PCM) channel. A PCM channel is characterized by a 64-kbps (kilobit per second) channel comprised of 8,000 8-bit voice samples per second. The 8,000 samples-per-second sampling rate limits the fidelity of the speech that can be sent through a PCM channel. Specifically, only voice frequencies below 4 KHz (kilohertz) can be transmitted through a PCM channel.
Many existing and proposed wireless communication networks utilize internal channels that are capable of data rates higher than 64 kbps. In order to provide better voice quality through such networks, new “wideband speech coders” have been proposed that use higher sampling rates than 8,000 samples per second, and are thus able to capture voice frequencies above 4 KHz. The voice quality achieved using wideband speech coders in tandem-free operation for mobile-to-mobile calls exceeds the voice quality that is possible when voice data is transmitted through a PCM channel. So long as the voice signal is never reduced, even temporarily, to PCM channel format, tandem-free operation enables the use of voice coders that can achieve better voice quality in a mobile-to-mobile call than is possible in a mobile-to-land call.
If the cellular phones in the previous example have speech coders of different types, tandem-free operation is generally not possible. A decoder using one type of speech coding cannot properly decode a digital speech frame encoded using a different type of encoder. Decoding the speech signal using a decoder of the first type within the wireless communication network, and then re-encoding the speech signal using an encoder of the second type will enable the receiving wireless terminal to reproduce an intelligible speech signal. However, as discussed above, this speech signal will suffer from added delay and speech quality degradation.
CDMA standards such as IS-95 and cdma2000 support the use of variable-rate vocoder frames in a spread spectrum environment, while standards based on GSM standards, such as W-CDMA, support the use of fixed-rate vocoder frames and multi-rate vocoder frames. Similarly, Universal Mobile Telecommunications Systems (UMTS) standards also support fixed-rate and multi-rate vocoders, but not variable-rate vocoders. An example of a variable-rate vocoder is the Selectable Mode Vocoder (SMV), which is promulgated in TIA IS-893. An example of a multi-rate vocoder is the Adaptive Multi-Rate (AMR) vocoder, which is promulgated in “ETSI EN 301 704 Digital Cellular Telecommunications System; Adaptive Multi-Rate (AMR) Speech Transcoding” (the AMR standard); and an example of a fixed-rate vocoder is a Enhanced Full Rate vocoder, which is promulgated in 3GPP TS 46.060: “Digital cellular telecommunications system (Phase 2+); Enhanced Full Rate (EFR) speech transcoding.”
A typical wireless communication network supports only one type of speech coder. The selection of a speech coder types is often tightly coupled to the type of wireless interface utilized by the wireless communication network. For example, IS-95 and cdma2000 networks use continuous transmission (CTX) wireless interfaces that are most compatible with variable-rate speech coders. A variable-rate speech coder can encode active speech data using any of multiple different data rates, varying the frame-by-frame data rate based on speech activity. A variable-rate speech coder provides one speech frame for every speech frame period. A conventional variable-rate speech coder for CTX CDMA systems encodes active speech at either full-rate, half-rate, or quarter-rate depending on speech activity. The same speech coder generates eighth-rate frames during pauses in speech. Decoding an eighth-rate frame produces “comfort noise.” Because the rate of frames depends upon the characteristics of data received from the “source,” variable rate speech coders are called “source-controlled.”
In contrast to CTX networks, W-CDMA networks utilize a discontinuous transmission (DTX) wireless interface. An adaptive multi-rate (AMR) vocoder is an example of a vocoder designed for use with a DTX network, for example a W-CDMA network. Rather than varying the data rate of voice frames based on speech activity, an AMR vocoder varies an active speech data rate, or “mode,” based on channel quality and generates discontinuous inactive speech data frames. An AMR speech encoder uses a single data rate for active speech data, and transmits inactive speech using a combination of silence indicator (SID) frames followed by empty speech frame periods. On the receiving end, an AMR speech decoder decodes active speech frames, and generates comfort noise for empty speech frame periods occurring after and between received SID frames. Each AMR mode is characterized by a data rate used for active speech. When characteristics such as noise limit the capacity of the wireless channel being used to transmit AMR frames, the AMR mode can be adjusted so as not to exceed the capacity of the channel. This type of rate control is referred to as “channel-controlled.”
Different speech coders may utilize different and mutually incompatible algorithms. Thus, a SMV speech decoder cannot decode a speech data frame generated by an AMR speech encoder, and vice versa. In addition, the different handling of periods of speech inactivity leads to further incompatibilities between speech coders. For example, an SMV speech decoder requires at least an eighth-rate frame for every speech frame period, while an AMR speech decoder can generate multiple frames of comfort noise following the receipt of a single SID frame.
A conventional wireless communication network is generally incapable of providing tandem-free or even tandem vocoding between two wireless terminals that use different types of speech coder in an intra-system mobile-to-mobile call. As customers become more accustomed to the high quality available with wideband speech coders, and as carriers widely deploy different types of wireless communication networks, demand will increase for inter-system mobile-to-mobile calls that provide wideband speech quality. There is therefore a need in the art for a way to provide tandem-free operation between wireless terminals that use different types of speech coder or communicate using different types of wireless interface.