Various over-the-air interfaces have been developed for wireless communication systems including, e.g., Frequency Division Multiple Access (“FDMA”), Time Division Multiple Access (“TDMA”), Code Division Multiple Access (“CDMA”), and Orthogonal Frequency Division Multiple Access (“OFDMA”). In connection therewith, various domestic and international standards have been established including, e.g., Advanced Mobile Phone Service (“AMPS”), Global System for Mobile Communications (“GSM”), and Interim Standard 95 (“IS-95”).
An exemplary wireless telephony communication system is a Code Division Multiple Access (“CDMA”) system. The IS-95 standard and its derivatives, IS-95A, ANSI J-STD-008, IS-95B, and third generation standards IS-95C and IS-2000, etc. (referred to collectively herein as IS-95), are promulgated by the Telecommunication Industry Association (TIA). Other well known standards bodies, such as The 3rd Generation Partnership Project 2 (“3GPP2”), specify the use of a CDMA over-the-air interface for cellular or PCS telephony communication systems. Exemplary wireless communication systems configured substantially in accordance with the use of the IS-95 standard are described in U.S. Pat. Nos. 5,103,459 and 4,901,307, which are assigned to the assignee of the present invention and fully incorporated herein by reference.
Multimedia streams may include speech, and may be from one or more sources that communicate with or are otherwise associated with a broadcast system. The broadcast system can use, without limitation, CDMA principles, GSM principles, or other wireless principles including wideband CDMA (WCDMA), cdma2000 (such as cdma2000 1x or 3x air interface standards, for example), TDMA, or TD-SCDMA, and OFDM. The multimedia content, including speech, can alternatively be provided, for example, over a bidirectional point-to-point link if desired, such as, e.g., a Bluetooth link or a 802.11 link or a CDMA link or GSM link. Likewise, speech content may also be transmitted using a Voice Over Internet Protocol (“VoIP”). VoIP is a protocol optimized for the transmission of voice through the Internet or other packet switched networks, which may interface with and/or merge with CDMA and GSM based systems.
Transmission of voice by digital techniques has become widespread, particularly in long distance and digital air-interface radio telephone applications. This, in turn, has created interest in determining the least amount of information which can be sent over the channel while maintaining the perceived quality of the reconstructed speech. If speech is transmitted by simply sampling and digitizing, a data rate on the order of 64 kilobits per second (kbps) is required to achieve a quality, known as “toll quality,” of a conventional analog telephone. However, through the use of speech analysis, followed by the appropriate coding, transmission, and resynthesis at the receiver, a significant reduction in the data rate can be achieved.
Devices which employ techniques to compress voiced speech by extracting parameters that relate to a model of human speech generation are typically called vocoders. Such devices are composed of an encoder, which analyzes the incoming speech to extract the relevant parameters, and a decoder, which resynthesizes the speech using the parameters which it receives over the transmission channel. In order to enhance quality, the speech codec model adapts to the changing speech signal. Modern vocoders typically operate on a digitized input signal that has been divided into blocks of time called analysis frames. Parameters are then extracted corresponding to the analysis frames.
Of the various classes of coders the Code Excited Linear Predictive Coding (“CELP”), Stochastic Coding or Vector Excited Speech Coding are of one class. An example of a coding algorithm of this particular class is described in the paper “A 4.8 kbps Code Excited Linear Predictive Coder” by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988. Modern vocoders typically operate at variable rates, and are defined by standards. While various types of vocoders exist, modern commercial telecommunications vocoders generally fall into two general classes, namely the CDMA type and the GSM type.
A modern CDMA type network speech codec is known as Enhanced Variable Rate CODEC (“EVRC”). A version of EVRC is defined by The Telecommunications Industry Association as IS-127-B, and is formally entitled “Enhanced Variable Rate Codec Speech Service Option 3 and YY for Wideband Spread Spectrum Digital Systems,” dated December 2006. A modern GSM type of network speech codec is known as Adaptive Multi-Rate (“AMR”). A version of AMR is defined by The 3rd Generation Partnership Project (“3GPP”) as 3G TS 26.090, version 3.1.0, release 1999, and is formally entitled “Universal Mobile Telecommunications System (“UMTS”); Mandatory Speech Codec speech processing functions AMR speech codec; Transcoding functions,” dated January 2000.
Modern Second Generation (“2G”) and Third Generation (“3G”) radio telephone communication systems have sought to produce voice quality commensurate with the conventional public switched telephone network (“PSTN”). The PSTN have traditionally been limited in bandwidth to the frequency range of 300-3400 kHz. New networks for voice communications, such as cellular telephony and Voice over IP (“VoIP”), are not necessarily constrained by the same bandwidth limits. Accordingly, it may be desirable to transmit and receive voice communications that include a wideband frequency range over such networks. For example, it may be desirable to support an audio frequency range that extends down to 50 Hz and/or up to 7 or 8 kHz. It may also be desirable to support other applications, such as high-quality audio or audio/video conferencing, that may have audio speech content in ranges outside the traditional PSTN limits. Codecs which seek to extend the audio frequency range as set forth above are commonly referred to as wideband codecs.
Extension of the range supported by a speech coder into higher frequencies may improve intelligibility. For example, the information that differentiates fricatives such as ‘s’ and ‘f’ is largely in the high frequencies. Highband extension may also improve other qualities of speech, such as presence. For example, even a voiced vowel may have spectral energy far above the PSTN limit.
The term signal processing may refer to the processing and interpretation of signals. Signals of interest may include sound, images, and many others. Processing of such signals may include storage and reconstruction, separation of information from noise, compression, and feature extraction. The term digital signal processing may refer to the study of signals in a digital representation and the processing methods of these signals. Digital signal processing is an element of many communications technologies such as mobile phones and the Internet. The algorithms that are utilized for digital signal processing may be performed using specialized computers, which may make use of specialized microprocessors called digital signal processors (sometimes abbreviated as DSPs).