The present invention relates generally to the communication of digital information, such as speech data communicated in a cellular, or other radio, communication system. More particularly, the present invention relates to a variable bit rate coder, and an associated method, by which to encode the digital information at a selected bit rate. Selection of the coding rate is made responsive to indicia of actual coding performance, subsequent to encoding of the information at more than one coding rate.
Advancements in communication technologies have permitted the introduction of, and popularization of, new types of, and improvements in existing, communication systems. Increasingly large amounts of data are permitted to be communicated at increasing thruput rates through the use of such new, or improved, communication systems. As a result of such improvements, new types of communications, requiring high data thruput rates, are possible. Digital communication techniques, for instance, are increasingly utilized in communication systems to communicate efficiently via digital data, and the use of such techniques has facilitated the increase of data thruput rates.
When digital communication techniques are used, information which is to be communicated is digitized. For example, when the information is formed of speech, such as that generated by a user using a mobile station of a cellular communication system, the speech is digitized, then signal processing operations are performed upon the digitized speech, and, then, quantization operations are performed upon the digitized speech. The result forms a compressed bit stream, referred to as speech data.
Conventionally, the speech initially in the form of a speech waveform, is first partitioned into a sequence of successive frames of constant length. Then, the operations noted above are performed to form the compressed bit stream which is sometimes formatted into packets of data. Such packets typically also include groups of bits which specify parameters used, at a receiving station to reconstruct the speech.
In a conventional analysis-by-syntheses (xe2x80x9cAbSxe2x80x9d) coding of speech, the speech waveform is partitioned into a sequence of successive frames and each frame has a fixed length and is partitioned into an integer number of equal length subframes. The encoder generates an excitation signal by a trial and error search process whereby each candidate excitation for a subframe is applied to a synthesis filter and the resulting segment of synthesized speech is compared with a corresponding segment of target speech. A measure of distortion is computed and a search mechanism identifies the best (or nearly-best) choice of excitation of each subframe among an allowed set of candidates. The candidates are sometimes stored as vectors in a codebook; in this case, the coding method is called CELP (code excited linear prediction). At other times, the candidates are generated as they are needed for the search by a predetermined generating mechanism; this case includes in particular multipulse linear predictive coding (MP-LPC) or algebraic code excited linear prediction (ACELP). The bits needed to specify the chosen excitation subframe are part of the package of data that is transmitted to a receiving station in each frame. Usually the excitation is formed in two stages, where the first approximation to the excitation subframe is selected by the ab0ve-described procedure, and then a modified target signal for the subframe is formed as the new target for a second AbS search operation Depending on the periodic or aperiodic character of the speech, different coding strategies can be employed. In order to eliminate as much redundancy as possible in coding the excitation signal for each frame, it is often desirable to classify the frames into categories. The coding method can then be tailored to each category.
In voiced speech, the energy peaks of the smoothed residual energy contour generally occur at pitch period intervals and correspond to pitch pulses. Pitch here refers to the fundamental frequency of periodicity in a segment of voiced speech and pitch period refers to the fundamental period of periodicity. In some transitional regions of the speech signal, the waveform does not have the character of being periodic or stationary random and often it contains one or more isolated energy bursts, as in plosive sounds. The unvoiced class consists of frames which are aperiodic and where the speech appears random-like in character, without strong isolated energy peaks. The silent class refers to frames where speech is absent but some background noise may be present.
In a typical implementation, the sampling rate is 8000 samples per second, the frame size is 160 samples. Each frame is classified into one of several classes, e.g., voiced, unvoiced, silence, transition. Other ways of classification include use of two voicing classes, e.g., weakly voiced, and strongly voiced voicing classes.
Coding techniques in general can be categoried according to several different manners by which to encode a frame of speech.
For instance, one category of encoding is referred to as fixed bit-rate coding. In a fixed bit-rate coding technique, every encoded frame of speech encoded by a particular fixed bit-rate coding technique is formed of the same number of bits. That is to say, an encoded frame of speech, encoded by a fixed bit-rate coding technique, is formed of a fixed number of bits.
In a discontinuous transmission (DTX) technique, a determination is made whether a frame of speech which is to be encoded is formed of active speech bits. If the frame is determined to be formed of active speech bits, a fixed bit allocation is applied to each of such frames. If a determination is made that the frame does not contain active speech bits, a reduced bit allocation is applied to such frames, such as xe2x80x9csilentxe2x80x9d frames.
In a dynamically-variable, bit-rate coding technique, each frame of speech is encoded using a different number of bits. In this technique, a large range of possible bit allocations of the encoded frame is possible, e.g., any integral number of bits up to some maximum value.
And, in a multi-class, variable bit-rate coding technique, each frame of speech is assigned, by way of a class selection procedure, to be one amongst a set of allowed classes. Each of such classes is associated with a particular allocation of bits for various parameters of the frame. And, all frames assigned to a single class have the same bit allocation. Class selection of a speech frame is based, for instance, upon a phonetic classification of the frame in which the major characteristics of the frame are classified according to the phonetic character of that frame of speech. More generally, a classifier is utilized to operate upon input speech applied to an encoder, once frame-formatted, or upon a linear prediction residual obtained from the input speech, to extract parameters better then combined to make a class decision. Typically, a relatively small number of classes, e.g., between three and six classes, are employed in speech coding when using a multi-class, variable bit-rate coding technique.
In some situations, different coding algorithms are applied to different classes. In some coders, two different classes may have the same total number of bits allocated for the frame but may differ in how the bits are allocated to different speech parameters of the frame. As long as all the classes do not have the same total bit allocation for the frame, a coder is considered to be a variable rate coder. In multi-class coders, each class has a different bit allocation so that any class selection mechanism controls the instantaneous bit rate of the coder. And, such a mechanism is referred to as a rate determination algorithm. The instantaneous bit rate at a particular time is merely the ratio of the number of bits allocated to the current frame divided by the time duration of the frame.
Fixed bit-rate coding techniques do not require a rate control mechanism and, therefore, are typically less complex than counterparts which require rate control mechanisms. Multi-class, variable bit-rate coding techniques and dynamically-variable, bit-rate coding techniques, in contrast, require a rate determination algorithm. But, variable rate coding techniques are generally more efficient as such techniques exploit the time-varying statistical properties of speech. A rate determination algorithm utilized in such techniques generally attempts to minimize the average bit-rate while ensuring that at least a minimum speech quality is maintained. The average bit-rate is particularly important in a cellular communication system which utilizes a CDMA (code-division, multiple-access) communication scheme as well as in communication applications in which voiced data is stored.
The average bit rate of a multi-class, variable bit-rate coding technique depends upon the rate determination algorithm as well as on the statistical character of input speech frames that are to be encoded. By modifying the parameters of the rate determination algorithm, the average bit rate can be altered.
Multi-class, variable bit-rate coding techniques are needed, for instance, for CDMA, cellular communication systems proposed for future installation, capable of operating at several different average bit rates. A coder which would be operable in such a manner would be operable pursuant to a selected one of several operating modes, wherein each operating mode is associated with a particular average bit rate.
A multi-class, variable bit-rate coding technique, and associated coder, capable of operating in more than one mode and which is capable of selecting which mode in which to encode a frame of data would therefore be advantageous.
It is in light of this background information related to the communication of digital information that the significant improvements of the present invention have evolved.
The present invention, accordingly, advantageously provides a variable bit rate coder, and an associated method, by which to encode a frame of data at a selected encoding rate.
Selection of which of at least two bit rates at which to encode a frame of data is made responsive to indicia of actual coding performance of the coder at the different bit rates. Thereby, selection of which rate at which to encode a frame of data is made responsive to actual encoding of the data, not merely an estimate of the encoding of the data. Because indicia of actual coding of the frame of data is utilized to determine at which rate to select bit rate at which the resultant, encoded frame is to be formed, a better tradeoff between coding rate and thruput rate is obtainable.
In one aspect of the present invention, a multi-class, variable bit-rate coder is provided for a radio transmitter, such as the transmitter portion of a cellular mobile terminal. The coders are operable to receive a frame of speech and to generate an output frame of encoded speech data, encoded at a selected bit rate. The coders are operable to encode the frame of speech at two or more bit rates. Analysis is made of the frame of speech encoded at each of the two or more bit rates. Responsive to the analysis of the frame of speech data, subsequent to encoding of the corresponding frame of speech at the at least two coding rates, a decision is made as to of which coding rate the encoded frame should be formed. If the characteristics of the frame, encoded at a lower of two or more coding rates are acceptable, a decision is made to utilize the frame of speech data, encoded at the lower coding rate. Thereby, improved thruput rates of the resultant, transmitted frame is possible while still ensuring that, if necessary, a higher coding rate shall be used.
In another aspect of the present invention, a coder is provided for a communication station operable in a cellular communication system, such as a CDMA (code-division, multiple-access) system. Speech, once digitized and formatted into frames, is provided to the coder. The speech frames are either voiced frames, unvoiced frames, or silent frames. Each frame of speech is first applied to a classifier which classifies the frame to be one of the aforementioned frame-types. When the frame is determined to be a silent frame, the frame is applied to a silent encoder which encodes the silent frame of speech at a silent-encoding rate. If, conversely, the classifier determines the frame of speech to be an unvoiced frame, the frame is applied to an unvoiced encoder which encodes the frame of speech at an unvoiced-encoding rate. And, if the classifier classifies the frame of speech to be a voiced frame, the classifier applies the frame of speech to at least two voiced encoders, each capable of encoding the frame at a different coding rate. For instance, in one implementation, the coder includes two voiced coder elements, one operable to encode the frame of speech at a bit rate of 4.0 Kb/s, and a second voice coder element operable to encode the data at a rate of 8.5 Kb/s. The voiced coders encode the frame of speech applied thereto, and indicia of the encoded frames formed by the respective voiced coders are provided to a selector. The selector is operable responsive to the indicia provided thereto to select one of the voiced coder elements to be used to form the resultant, encoded frame of speech when the classifier determines the frame of speech to be a voiced frame. Because selection is made by the selector of the coding rate responsive to actual indicia of the encoded frame of speech data, improved selection of the coding rate is provided.
In another aspect of the present invention, a coder is provided for a communication station, also operable in a cellular communication system, such as a CDMA (code-division, multi-access) cellular communication system. Frames of speech are provided to the coder subsequent to digitizing and formatting of the speech into the frames. The frames are selectively of voiced data, unvoiced data, and silent data. Each frame is provided to a silence coder, an unvoiced coder, and at least two voiced coders. Each coder encodes the frame of speech applied thereto according to a respective coding rate. The two voiced coder elements are operable at separate coding rates. Indicia of the encoded frames encoded by each of the coders is provided to a selector. The selector is operable responsive to such indicia to determine from which coder element the resultant, encoded frame should be formed. Thereby, selection is made responsive to actual encoded frames of speech rather than estimates of such coded frames.
In these and other aspects, therefore, a variable bit rate coder, and an associated method, is provided for a sending station operable in a communication system. The sending station sends an encoded set of data upon a communication channel. The encoded data is an encoded representation of digital information. The variable bit rate coder codes the digital information into the encoded data. A first bit rate coder element is coupled to receive the digital information. The first bit rate coder element codes the digital information at a first coding rate to form a first-coded set of data. A second bit rate coder element is also coupled to receive the digital information. The second bit rate coder element codes the digital information at a second coding rate to form a second-coded set of data. A coding rate selector is coupled to receive at least indicia of the coding-rate performance of the first bit rate encoder element and of indicia of the coding-rate performance of the second bit rate encoder element. The coding rate selector selects the encoded data to be formed of a selected one of the first-coded set of data and the at least the second-coded set of data. Selection by the coding rate selector is responsive to values of the indicia of the coding-rate performance of the first and at least second bit rate coder elements, respectively.
The present invention and the scope thereof can be obtained from the accompanying drawings which are briefly summarized below, the following detailed description of the presently-preferred embodiments of the invention, and the appended claims.