In the modern information society data in a digital form, such as speech, is transferred in an increasing volume. A great share of this information is transferred utilizing wireless telecommunication connections, as e.g. in various mobile communication systems. It is in particular here that high requirements are set to the efficiency of data transfer in order to utilize the limited number of radio frequencies as efficiently as possible. In addition to this, in connection with new services a simultaneous need for both a higher data transfer capacity and a better voice quality is present. In order to achieve these targets different encoding algorithms are developed continuously, with the aim to reduce the average number of bits of a data transfer connection without compromising the standard of the services offered. In general this target is striven for according to two basic principles: either by trying to make fixed line speed encoding algorithms more efficient or by developing encoding algorithms utilizing variable line speed.
The relative efficiency of a speech codec operating a variable bit rate is based upon the fact that speech is variable in character, in other words, a speech signal contains a different amount of information at different times. If a speech signal is divided into speech frames of standard length (e.g. 20 ms) and each of them is encoded separately, the number of bits used for modelling each speech frame can be adjusted. In this way speech frames containing a small amount of information can be modelled using a lower number of bits than speech frames containing plenty of information. In this case it is possible to keep the average bit rate lower than in speed codecs utilizing fixed line speed and maintain the same subjective voice quality.
Encoding algorithms based upon variable bit rate can be utilized in various ways. Packet networks, such as e.g. Internet and ATM (Asynchronous Transfer Mode)-networks, are well suited for variable bit rate speech codecs. The network provides the data transfer capacity currently required by the speech codec by adjusting the length and/or transmission frequency of the data packets to be transferred in the data transfer connection. Speech codecs using variable bit rate are also well suited for digital recording of speech in e.g. telephone answering machines and speech mail services.
It is possible to adjust the bit rate of a speech codec operating at a variable bit rate in a number of ways. In generally known variable rate speech codecs the transmitter bit rate is decided already before the encoding of the signal to be transmitted. This is the procedure e.g. in connection with the speech codec of QCELP-type used in the CDMA (Code Division Multiple Access) mobile communication system prior known to a person skilled in the art, in which system certain predetermined bit rates are available for speech encoding. These solutions however only have a limited number of different bit rates, typically two speeds for a speech signal, e.g. full speed (1/1) and half speed (1/2) encoding) and a separate, low bit rate for background noise (e.g. 1/8-speed). Patent publication WO 9605592 A1 presents a method in which input signal is divided into frequency bands and the required encoding bit rate is assessed for each frequency band based upon the energy contents of the frequency band. The final decision upon the encoding speed (bit rate) to be used is made based upon these frequency band specific bit rate decisions. Another method is to adjust the bit rate as a function of the available data transfer capacity. This means that any current bit rate to be used is selected based upon the fact how much data transfer capacity is available. This kind of procedure results in reduced voice quality when the telecommunication network is heavily loaded (the number of bits available for speech encoding is limited). On the other hand the procedure unnecessarily loads the data transfer connection at moments which are "easy" for speech encoding.
Other methods, prior known to a person skilled in the art, used in variable bit rate speech codecs for adjusting the bit rate of the speech encoder are the detection of to voice activity (VAD, Voice Activity Detection). It is possible to use the detection of voice activity e.g. in connection with a fixed line speed codec. In this case the speech encoder can be entirely switched off when the voice activity detector finds out that the speaker is quiet. The result is the simplest possible speech codec operating at variable line speed.
Speech codecs operating at fixed bit rate, which nowadays are very widely used e.g. in mobile communication systems, are operating at same bit rate independent of the contents of the speech signal. In these speech codecs one is forced to select a compromise bit rate, which on one hand does not waste too much of the data transfer capacity and on the other hand provides a sufficient speech quality even for speech signals which are difficult to encode. With this procedure the bit rate used for speech encoding is always unnecessarily high for so called easy speech frames, the modelling of which could be successfully carried out even by a speech codec with a lower bit rate. In other words, the data transfer channel is not used effectively. Among easy speech frames are e.g. silent moments detected utilizing a speech activity detector (VAD), strongly voiced sounds (resembling sinus-signals, which can successfully be modelled based upon amplitude and frequency) and some of the phoneme resembling noise. Due to the characteristics of the hearing, noise need not be equally accurately modelled, because an ear will not detect small differences between the original and the coded (even if poor) signal. Instead, voiced sections easily mask noise. Voiced sections must be encoded accurately (accurate parameters (plenty of bits) are to be used)), because an ear will hear even small differences in signals.
FIG. 1 presents a typical speech encoder utilizing code-excited linear prediction (CELP, Code Excited Linear Predictor). It comprises several filters used for modelling the speech production. A suitable excitation signal is selected for these filters from an excitation code book containing a number of excitation vectors. A CELP speech encoder typically comprises both short-term and long-term filters, using which it is attempted to synthesize a signal resembling the original speech signal as much as possible. Normally all excitation vectors stored in an excitation code book are checked in order to find the best excitation vector. During the excitation vector search each suitable excitation vector is forwarded to the synthesizing filters, which typically comprise both short-term and long-term filters. The synthesized speech signal is compared with the original speech signal and the excitation vector which produces the signal best corresponding to the original signal is selected. In the selection criterion the ability of human ear to detect different errors is generally utilized, and the excitation vector producing the smallest error signal for each speech frame is selected. The excitation vectors used in a typical CELP-speech encoder have been determined experimentally. When a speech encoder of ACELP-type (Algebraic Code Excited Linear Predictor) is used, the excitation vector consists of a fixed number of pulses different from zero, which pulses are mathematically calculated. In this case an actual excitation code book is not required. The best excitation is obtained by selecting optimal pulse positions and amplitudes using the same error criterion as in above CELP-encoder.
Speech encoders of CELP- and ACELP-types, prior known to a person skilled in the art, use fixed rate excitation calculation. The maximum number of pulses per excitation vector is fixed, as well as the number of different pulse positions within a speech frame. When each pulse is still quantized with fixed accuracy, the number of bits to be generated per each excitation vector is constant regardless of the incoming speech signal. CELP-type codecs use a large number of bits for the quantizing of excitation signals. When high quality speech is generated a relatively large code book of excitation signals is required in order to have access to a sufficient number of different excitation vectors. The codecs of ACELP-type have a similar problem. The quantization of the location, amplitude and prefix of the pulses used consumes a large number of bits. A fixed-rate ACELP speech encoder calculates a certain number of pulses for each speech fame (or subframe) regardless of the original source signal. In this way it consumes the data transfer line capacity, reducing the total efficiency unnecessarily.