The present invention relates to a transmission system comprising a speech encoder for deriving an encoded speech signal from an input speech signal, the transmitting arrangement comprises transmit means for transmitting the encoded speech signal to a receiving arrangement, the receiving arrangement comprising a speech decoder for decoding the encoded speech signal.
Such transmission systems are used in applications in which speech signals have to be transmitted over a transmission medium with a limited transmission capacity, or have to be stored on storage media with a limited storage capacity. Examples of such applications are the transmission of speech signals over the Internet, transmission of speech signals from a mobile phone to a base station and vice versa and storage of speech signals on a CD-ROM, in a solid state memory or on a hard disk drive.
In a speech encoder the speech signal is analyzed by analysis means which determines a plurality of analysis coefficients for a block of speech samples, also known as a frame. A group of these analysis coefficients describes the short time spectrum of the speech signal. An other example of an analysis coefficient is a coefficient representing the pitch of a speech signal. The analysis coefficients are transmitted via the transmission medium to the receiver where these analysis coefficients are used as coefficients for a synthesis filter.
Besides the analysis parameters, the speech encoder also determines a number of excitation sequences (e.g. 4) per frame of speech samples. The interval of time covered by such excitation sequence is called a sub-frame. The speech encoder is arranged for finding the excitation signal resulting in the best speech quality when the synthesis filter, using the above mentioned analysis coefficients, is excited with said excitation sequences.
A representation of said excitation sequences is transmitted via the transmission channel to the receiver. In the receiver, the excitation sequences are recovered from the received signal and applied to an input of the synthesis filter. At the output of the synthesis filter a synthetic speech signal is available.
Experiments have shown that the speech quality of such a transmission system is substantially deteriorated when the input signal of the speech encoder comprises a substantial amount of background noise.
The object of the present invention is to provide a transmission system according to the preamble in which the speech quality is improved when the input signal of the speech encoder comprises a substantial amount of background noise.
To achieve said purpose, the transmission system according to the present invention is characterized in that the speech encoder and/or the speech decoder comprises background noise determining means for determining a background noise property of the speech signal, in that the speech encoder and/or the speech decoder comprises at least one background noise dependent element, and in that the speech encoder and/or speech decoder comprises adaptation means for changing at least one property of the background noise dependent element in dependence on the background noise property.
Experiments have shown that it is possible to enhance the speech quality if background noise dependent processing is performed in the speech encoder and/or in the speech decoder by using a background noise dependent element. The background noise property can e.g. be the level of the background noise, but it is conceivable that other properties of the background noise signals are used. The background noise dependent element can e.g. be the codebook used for generating the excitation signals, or a filter used in the speech encoder or decoder.
A first embodiment of the invention is characterized in that in that the speech encoder comprises, a perceptual weighting filter for deriving a perceptually weighted error signal representing a perceptually weighted error between the input speech signal and a synthetic speech signal, and in that the background noise dependent element comprises the perceptual weighting filter.
In speech encoders, it is common to use a perceptual weighting filter for obtaining a perceptual weighted error signal representing a perceptual difference between the input speech signal and a synthetic speech signal based on the encoded speech signal. Experiments have shown that making the properties of the perceptual weighting filter dependent on the background noise property, results in an improvement of the quality of the reconstructed speech.
A further embodiment of the invention is characterized in that the speech encoder comprises analysis means for deriving analysis parameters from the input speech signal, the properties of the perceptual weighting filter are derived from the analysis parameters, and in that the adaptation means are arranged for providing altered analysis parameters representing the speech signal being subjected to a high pass filtering operation to the perceptual weighting filter.
Experiments have shown that the best results are obtained when some of the analysis parameters to be used with the perceptual weighting filter represent a high pass filtered input signal. These analysis parameters can be obtained by performing the analysis on a high pass filtered input signal, but it is also possible that the altered analysis parameters are obtained by performing a transformation on the analysis parameters.
A further embodiment of the invention is characterized in that the speech decoder comprises a synthesis filter for deriving a synthetic speech signal from the encoded speech signal, the speech decoder comprises a post processing means for processing the output signal from the synthesis filter, and in that the back ground noise dependent element comprises the post processing means.
In speech coding systems often post processing means, comprising e.g. a post filter, are used to enhance the speech quality. Such post processing means comprising a post filter enhances the formants with respect to the valleys in the spectrum. Under low background noise conditions, the use of this post processing means results in an improved speech quality. However, experiments have shown that the post processing means deteriorate the speech quality if a substantial amount of background noise is present. By making one or more properties of the post processing means dependent on a property of the background noise, the speech quality can be improved. An example of such a property is the transfer function of the post processing means.