1. Field of the invention
The present invention relates to a perceptual weighting device and method for producing a perceptually weighted signal in response to a wideband signal (0-7000 Hz) in order to reduce a difference between a weighted wideband signal and a subsequently synthesized weighted wideband signal.
2. Brief description of the prior art
The demand for efficient digital wideband speech/audio encoding techniques with a good subjective quality/bit rate trade-off is increasing for numerous applications such as audio/video teleconferencing, multimedia, and wireless applications, as well as Internet and packet network applications. Until recently, telephone bandwidths filtered in the range 200-3400 Hz were mainly used in speech coding applications. However, there is an increasing demand for wideband speech applications in order to increase the intelligibility and naturalness of the speech signals. A bandwidth in the range 50-7000 Hz was found sufficient for delivering a face-to-face speech quality. For audio signals, this range gives an acceptable audio quality, but is still lower than the CD quality which operates on the range 20-20000 Hz.
A speech encoder converts a speech signal into a digital bitstream which is transmitted over a communication channel (or stored in a storage medium). The speech signal is digitized (sampled and quantized with usually 16-bits per sample) and the speech encoder has the role of representing these digital samples with a smaller number of bits while maintaining a good subjective speech quality. The speech decoder or synthesizer operates on the transmitted or stored bit stream and converts it back to a sound signal.
One of the best prior art techniques capable of achieving a good quality/bit rate trade-off is the so-called Code Excited Linear Prediction (CELP) technique. According to this technique, the sampled speech signal is processed in successive blocks of L samples usually called frames where L is some predetermined number (corresponding to 10-30 ms of speech). In CELP, a linear prediction (LP) synthesis filter is computed and transmitted every frame. The L-sample frame is then divided into smaller blocks called subframes of size N samples, where L=kN and k is the number of subframes in a frame (N usually corresponds to 4-10 ms of speech). An excitation signal is determined in each subframe, which usually consists of two components: one from the past excitation (also called pitch contribution or adaptive codebook) and the other from an innovative codebook (also called fixed codebook). This excitation signal is transmitted and used at the decoder as the input of the LP synthesis filter in order to obtain the synthesized speech.
An innovative codebook in the CELP context, is an indexed set of N-sample-long sequences which will be referred to as N-dimensional codevectors. Each codebook sequence is indexed by an integer k ranging from 1 to M where M represents the size of the codebook often expressed as a number of bits b, where M=2b.
To synthesize speech according to the CELP technique, each block of N samples is synthesized by filtering an appropriate codevector from a codebook through time varying filters modelling the spectral characteristics of the speech signal. At the encoder end, the synthesis output is computed for all, or a subset, of the codevectors from the codebook (codebook search). The retained codevector is the one producing the synthesis output closest to the original speech signal according to a perceptually weighted distortion measure. This perceptual weighting is performed using a so-called perceptual weighting filter, which is usually derived from the LP synthesis filter.
The CELP model has been very successful in encoding telephone band sound signals, and several CELP-based standards exist in a wide range of applications, especially in digital cellular applications. In the telephone band, the sound signal is band-limited to 200-3400 Hz and sampled at 8000 samples/sec. In wideband speech/audio applications, the sound signal is band-limited to 50-7000 Hz and sampled at 16000 samples/sec.
Some difficulties arise when applying the telephone-band optimized CELP model to wideband signals, and additional features need to be added to the model in order to obtain high quality wideband signals. Wideband signals exhibit a much wider dynamic range compared to telephone-band signals, which results in precision problems when a fixed-point implementation of the algorithm is required (which is essential in wireless applications). Furthermore, the CELP model will often spend most of its encoding bits on the low-frequency region, which usually has higher energy contents, resulting in a low-pass output signal. To overcome this problem, the perceptual weighting filter has to be modified in order to suit wideband signals, and pre-emphasis techniques which boost the high frequency regions become important to reduce the dynamic range, yielding a simpler fixed-point implementation, and to ensure a better encoding of the higher frequency contents of the signal.
In CELP-type encoders, the optimum pitch and innovative parameters are searched by minimizing the mean squared error between the input speech and synthesized speech in a perceptually weighted domain. This is equivalent to minimizing the error between the weighted input speech and weighted synthesis speech, where the weighting is performed using a filter having a transfer function W(z) of the form:
W(z)=A(z/g1)/A(z/g2) where 0 less than xcex932 less than xcex931xe2x89xa61.
In analysis-by-synthesis (AbS) coders, analysis show that the quantization error is weighted by the inverse of the weighting filter, Wxe2x88x921(z), which exhibits some of the formant structure in the input signal. Thus, the masking property of the human ear is exploited by shaping the error, so that it has more energy in the formant regions, where it will be masked by the strong signal energy present in those regions. The amount of weighting is controlled by the factors xcex931 and xcex932.
This filter works well with telephone band signals. However, it was found that this filter is not suitable for efficient perceptual weighting when it was applied to wideband signals. It was found that this filter has inherent limitations in modelling the formant structure and the required spectral tilt concurrently. The spectral tilt is more pronounced in wideband signals due to the wide dynamic range between low and high frequencies. It was suggested to add a tilt filter into filter W(z) in order to control the tilt and formant weighting separately.
An object of the present invention is therefore to provide a perceptual weighting device and method adapted to wideband signals, using a modified perceptual weighting filter to obtain a high quality reconstructed signal, these device and method enabling fixed point algorithmic implementation.
More specifically, in accordance with the present invention, there is provided a perceptual weighting device for producing a perceptually weighted signal in response to a wideband signal in order to reduce a difference between a weighted wideband signal and a subsequently synthesized weighted wideband signal. This perceptual weighting device comprises:
a) a signal preemphasis filter responsive to the wideband signal for enhancing the high frequency content of the wideband signal to thereby produce a preemphasised signal;
b) a synthesis filter calculator responsive to the preemphasised signal for producing synthesis filter coefficients; and
c) a perceptual weighting filter, responsive to the preemphasised signal and the synthesis filter coefficients, for filtering the preemphasised signal in relation to the synthesis filter coefficients to thereby produce the perceptually weighted signal. The perceptual weighting filter has a transfer function with fixed denominator whereby weighting of the wideband signal in a formant region is substantially decoupled from a spectral tilt of that wideband signal.
The present invention also relates to a method for producing a perceptually weighted signal in response to a wideband signal in order to reduce a difference between a weighted wideband signal and a subsequently synthesized weighted wideband signal. This method comprises: filtering the wideband signal to produce a preemphasised signal with enhanced high frequency content; calculating, from the preemphasised signal, synthesis filter coefficients; and filtering the preemphasised signal in relation to the synthesis filter coefficients to thereby produce a perceptually weighted speech signal. The filtering comprises processing the preemphasis signal through a perceptual weighting filter having a transfer function with fixed denominator whereby weighting of the wideband signal in a formant region is substantially decoupled from a spectral tilt of the wideband signal.
In accordance with preferred embodiments of the subject invention:
reduction of the dynamic range comprises filtering the wideband signal through a transfer function of the form:
P(z)=1xe2x88x92xcexczxe2x88x921
xe2x80x83wherein xcexc is a preemphasis factor having a value located between 0 and 1;
the preemphasis factor xcexc is 0.7;
the perceptual weighting filter has a transfer function of the form:
W(z)=A (z/xcex31)/(1xe2x88x92xcex32zxe2x88x921)
xe2x80x83where 0 less than xcex32 less than xcex31xe2x89xa61 and xcex32 and xcex31 are weighting control values; and
the variable xcex32 is set equal to xcexc.
Therefore, the overall perceptual weighting of the quantization error is obtained by a combination of a preemphasis filter and a modified weighting filter to enable high subjective quality of the decoded wideband sound signal into filter W(z) in order to control the tilt and formant weighting separately.
The solution to the problem exposed in the brief description of the prior art is accordingly to introduce a preemphasis filter at the input, compute the synthesis filter coefficients based on the preemphasized signal, and use a modified perceptual weighting filter by fixing its denominator. By reducing the dynamic range of the wideband signal, the preemphasis filter renders the wideband signal more suitable for fixed-point implementation, and improves the encoding of the high frequency contents of the spectrum.
The present invention further relates to an encoder for encoding a wideband signal, comprising: a) a perceptual weighting device as described herein above; b) an pitch codebook search device responsive to the perceptually weighted signal for producing pitch codebook parameters and an innovative search target vector; c) an innovative codebook search device, responsive to the synthesis filter coefficients and to the innovative search target vector, for producing innovative codebook parameters; and d) a signal forming device for producing an encoded wideband signal comprising the pitch codebook parameters, the innovative codebook parameters, and the synthesis filter coefficients.
Still further in accordance with the present invention, there is provided:
a cellular communication system for servicing a large geographical area divided into a plurality of cells, comprising: a) mobile transmitter/receiver units; b) cellular base stations respectively situated in the cells; c) a control terminal for controlling communication between the cellular base stations; d) a bidirectional wireless communication sub-system between each mobile unit situated in one cell and the cellular base station of this cell, this bidirectional wireless communication sub-system comprising, in both the mobile unit and the cellular base station:
i) a transmitter including an encoder as described hereinabove for encoding a wideband signal and a transmission circuit for transmitting the encoded wideband signal; and
ii) a receiver including a receiving circuit for receiving a transmitted encoded wideband signal and a decoder for decoding the received encoded wideband signal.
a cellular mobile transmitter/receiver unit comprising:
a) a transmitter including an encoder as described hereinabove for encoding a wideband signal and a transmission circuit for transmitting the encoded wideband signal; and
b) a receiver including a receiving circuit for receiving a transmitted encoded wideband signal and a decoder for decoding the received encoded wideband signal;
a cellular network element comprising:
a) a transmitter including an encoder as described hereinabove for encoding a wideband signal and a transmission circuit for transmitting the encoded wideband signal; and
b) a receiver including a receiving circuit for receiving a transmitted encoded wideband signal and a decoder for decoding the received encoded wideband signal; and
a bidirectional wireless communication sub-system between each mobile unit situated in one cell and the cellular base station of this cell, this bidirectional wireless communication sub-system comprising, in both the mobile unit and the cellular base station:
a) a transmitter including an encoder as described hereinabove for encoding a wideband signal and a transmission circuit for transmitting the encoded wideband signal; and
b) a receiver including a receiving circuit for receiving a transmitted encoded wideband signal and a decoder for decoding the received encoded wideband signal.
The objects, advantages and other features of the present invention will become more apparent upon reading of the following non restrictive description of preferred embodiments thereof, given by way of example only with reference to the accompanying drawings.