The present invention relates generally to encoding of an acoustic source signal such that a corresponding signal reconstructed on basis of the encoded information has a perceived sound quality, which is higher than according to known encoding solutions. More particularly the invention relates to encoding of acoustic signals to produce encoded information for transmission over a transmission medium according to the preambles of claims 1 and 31 respective decoding of encoded information having been transmitted over a transmission medium according to the preambles of claims 15 and 37. The invention also relates to communication system according to claim 44, computer programs according to claims 13 and 29 respectively and computer readable media according to claims 13 and 30 respectively.
There are many different applications for speech codecs (codec=coder and decoder). Encoding and decoding schemes are used for bit-rate efficient transmission of acoustic signals in fixed and mobile communications systems and in videoconferencing systems. Speech codecs can also be utilised in secure telephony and for voice storage.
The trend in fixed and mobile telephony and in videoconferencing is towards improved quality of the reconstructed acoustic signal. This trend reflects the customer expectation that these systems provide a sound quality equal to or better than that of today""s fixed telephone network. One way to meet this expectation is to broaden the frequency band for the acoustic signal and thus convey more of the information contained in the source signal to the receiver. It is true that the majority of the energy of a speech signal is spectrally located between 0 kHz and 4 kHz (i.e. the typical bandwidth of a state-of-the-art codec). However, a substantial amount of the energy is also distributed in the frequency band 4 kHz to 8 kHz. The frequency components in this band represent information that is perceived by a human listener as xe2x80x9cclearnessxe2x80x9d and a feeling of the speaker xe2x80x9cbeing closexe2x80x9d to the listener.
The frequency resolution of the human hearing decreases with increasing frequencies. The frequency components between 4 kHz and 8 kHz therefore require comparatively few bits to model with a sufficient accuracy. Today there are, nevertheless, no known bit-rate efficient broadband codecs, which provide a reconstructed acoustic signal with a satisfying perceived quality. The existing ITU-T G.722 wideband coding standard, which operates at bit-rates of 48, 56 and 64 kbps merely offers unsatisfying quality, when comparing with the employed bit-rates (ITU-T=International Telecommunication Union, standardisation sector).
The U.S. Pat. No. 5,956,686 describes an adaptive transform coding/decoding arrangement in which the spectrum of an envelope is divided into frequency bands, so that different coding methods can be applied to the envelopes of the individual bands. This makes it possible to exploit different redundancies between the bands of the spectrum envelope. The spectrum envelope is also adjusted to the coding and/or transmission method to compensate for the time fluctuation in each frequency band.
The U.S. Pat. No. 5,526,464 describes a code excited linear prediction coding method where the residual signal is divided into frequency bands. A particular codebook is provided for each band and the size of the codebook decreases with increasing frequency band. The sampling rate is reduced with decreasing frequency in order to reduce the codebook search complexity.
Hence, there exist examples in the art where the applied coding schemes take into consideration the varying properties of different frequency bands. However, the different properties have only been utilised to obtain a bit-efficient coding of the source signal. There are yet no teachings of any special measures taken to compensate for inherent deficiencies in the applied coding when using a coding scheme optimised for a first frequency band for coding signals in a second frequency band.
Today, most speech coding models are designed for narrowband signals (typically 0-4 kHz). If such speech coding models are applied for coding of an acoustic signal having a larger bandwidth, say 0-8 kHz, the coding will only be optimised for a part of the relevant frequency band, namely the lower part.
One reason for this is that the quantisation of coding parameters generally involves correlation in the time domain between a target signal and a reproduced signal. Such correlation will primarily be based on signal matching in the low-frequency region since the higher frequency components of a speech signal have a low power density in comparison to the low frequency components. As a result of this, the high frequency components will be poorly reproduced at the receiver side.
Unfortunately, this poor reproduction cannot be excused either by flaws in the human hearing or by the characteristics of voice signals. When voice sounds are generated, the vocal tract operates as a filter on airwaves originating the lungs. The so-called formants correspond to the resonance frequencies of this filter. In the lower frequency band of a voice, signal the target signal has distinct formants. However, for higher frequencies the formants are more diffuse. Due to the limitations of the speech model used an acoustic signal having a relatively large bandwidth being encoded by means of a conventional narrowband coder will be reproduced as a signal having distinct spectral structure (i.e. peaks and valleys) also in its upper frequency band. A human listener generally perceives an acoustic signal with such characteristics as unnatural and having a metallic like sound.
Occasionally, a secondary coder is applied either to the output signal of the first coder or in parallel with the first coder in order to further increase the quality of the reconstructed signal. If this measure is taken for a conventional narrowband coder when used for encoding a broadband source signal the spectral structure in the high end of the frequency band will occasionally be even more pronounced. While this is desirable for narrowband acoustic signals in terms of improved sound quality, for wideband acoustic signals, however, the effect may be contrary.
The object of the present invention is therefore to provide an improved coding scheme for acoustic signals, which alleviates the problems above.
According to one aspect of the invention the object is achieved by a method of encoding an acoustic source signal to produce encoded information for transmission over a transmission medium as initially described, which is characterised by the primary coded signal and the target signal each comprising coefficients of which each coefficient represents a frequency component. At least one smoothed signal corresponding to the primary coded signal respective the target signal is produced that is a selectively modified version of the primary coded signal respective the target signal wherein a variation is reduced in the coefficient values representing frequency information above a threshold value.
According to a further aspect of the invention the object is achieved by a computer program directly loadable into the internal memory of a computer, comprising software for controlling the method described in the above paragraph when said program is run on a computer.
According to another aspect of the invention the object is achieved by a computer readable medium, having a program recorded thereon, where the program is to make a computer control the method described in the penultimate paragraph above.
According to still another aspect of the invention the object is achieved by a method of decoding an estimate of an acoustic source signal as initially described, which is characterised by a smoothed primary decoded spectrum comprising coefficients of which each represents a frequency component. The smoothed primary decoded spectrum is a selectively modified version of one of the at least one primary decoded spectrum wherein a variation is reduced in the coefficient values representing frequency information above a threshold value.
According to a further aspect of the invention the object is achieved by a computer program directly loadable into the internal memory of a computer, comprising software for controlling the method described in the above paragraph when said program is run on a computer.
According to another aspect of the invention the object is achieved by a computer readable medium, having a program recorded thereon, where the program is to make a computer control the method described in the penultimate paragraph above.
According to yet another aspect of the invention the object is achieved by a transmitter as initially described, which is characterised in that at least one spectral smoothing unit is devised to produce a smoothed output signal from a primary coded signal by selectively modifying the primary coded signal such that a variation is reduced in coefficient values thereof representing frequency information above a threshold value.
According to yet an additional aspect of the invention the object is achieved by a receiver as initially described, which is characterised in that a smoothed primary decoded spectrum comprises coefficients of which each represents a frequency component. A spectral smoothing unit in the receiver is devised to produce the smoothed primary decoded spectrum by selectively modifying at least one primary decoded spectrum such that a variation is reduced in the coefficient values representing frequency information above a threshold value.
According to yet an additional aspect of the invention the object is achieved by a communication system for transmission of an acoustic source signal from a first to a second node. The communication system includes, in the first node, the proposed transmitter for encoding the acoustic source signal and to produce encoded information. In the second node is included the proposed receiver for receiving the encoded information produced by the transmitter and for decoding an estimate of the encoded information into an estimate of the acoustic source signal. A transmission medium is used for transmitting the at least one enhanced coded signal from the transmitter to the receiver.
The proposed reduction of the variation in coefficient values representing frequency information above a threshold value, in one or more of the signals from which an acoustic signal is to be reconstructed by a receiver, improves the perceived naturalness of typical acoustic signals, such as voice sounds or music. Particularly, the metallic sound generated by the prior-art coding techniques is mitigated to a considerable extent. This is an especially desired effect, since the perceived sound quality will be a key factor in the success of future wide band applications.