The present invention relates to techniques for transmitting voice information in communication networks, and more particularly to techniques for enhancing narrowband speech signals at a receiver.
In the transmission of voice signals, there is a trade off between network capacity (i.e., the number of calls transmitted) and the quality of the speech signal on those calls. Most telephone systems in use today encode and transmit speech signals in the narrow frequency band between about 300 Hz and 3.4 kHz with a sampling rate of 8 kHz, in accordance with the Nyquist theorem. Since human speech contains frequencies between about 50 Hz and 13 kHz, sampling human speech at an 8 kHz rate and transmitting the narrow frequency range of approximately 300 Hz to 3.4 kHz necessarily omits information in speech signal. Accordingly, telephone systems necessarily degrade the quality of voice signals.
Various methods of extending the bandwidth of speech signals transmitted in telephone systems have been developed. The methods can be divided into two categories. The first category includes systems that extend the bandwidth of the speech signal transmitted across the entire telephone system to accommodate a broader range of frequencies produced by human speech. These systems impose additional bandwidth requirements throughout the network, and therefore are costly to implement.
A second category includes systems that use mathematical algorithms to manipulate narrowband speech signals used by existing phone systems. Representative examples include speech coding algorithms that compress wideband speech signals at a transmitter, such that the wideband signal may be transmitted across an existing narrowband connection. The wideband signal must then be de-compressed at a receiver. These methods can be expensive to implement since the structure of the existing systems need to be changed.
Other techniques implement a xe2x80x9ccodebookxe2x80x9d approach. A codebook is used to translate from the narrowband speech signal to the new wideband speech signal. Often the translation from narrowband to wideband is based on two models: one for narrowband speech analysis and one for wideband speech synthesis. The codebook is trained on speech data to xe2x80x9clearnxe2x80x9d the diversity of most speech sounds (phonemes). When using the codebook, narrowband speech is modeled and the codebook entry that represents a minimum distance to the narrowband model is searched. The chosen model is converted to its wideband equivalent, which is used for synthesizing the wideband speech. One drawback associated with codebooks is that they need significant training.
Another method is commonly referred to as spectral folding. Spectral folding techniques are based on the principle that content in the lower frequency band may be folded into the upper band. Normally the narrowband signal is re-sampled at a higher sampling rate to introduce aliasing in the upper frequency band. The upper band is then shaped with a low-pass filter, and the wideband signal is created. These methods are simple and effective, but they often introduce high frequency distortion that makes the speech sound metallic.
Accordingly, there is a need in the art for additional systems and methods for transmitting narrowband speech signals. Further, there is a need in the art for systems and methods for processing narrowband speech signals at a receiver to simulate wideband speech signals.
The present invention addresses these and other needs by adding synthetic information to a narrowband speech signal received at a receiver. Preferably, the speech signal is spilt into a vocal tract model and an excitation signal. One or more resonance frequencies may be added to the vocal tract model, thereby synthesizing an extra formant in the speech signal. Additionally, a new synthetic excitation signal may be added to the original excitation signal in the frequency range to be synthesized. The speech may then be synthesized to obtain a wideband speech signal. Advantageously, methods of the invention are of relatively low computational complexity, and do not introduce significant distortion into the speech signal.
In one aspect, the present invention provides a method for processing a speech signal. The method comprises the steps of: analyzing a received, narrowband signal to determine synthetic upper band content; reproducing a lower band of the speech signal using the received, narrowband signal; and combining the reproduced lower band with the determined, synthetic upper band to produce a wideband speech signal having a synthesized component.
According to further aspects of the invention, the step of analyzing further comprises the steps of: performing a spectral analysis on the received narrowband signal to determine parameters associated with a speech model and a residual error signal; determining a pitch associated with the residual error signal; identifying peaks associated with the received, narrowband signal; and copying information from the received, narrowband signal into an upper frequency band based on at least one of the determined pitch and the identified peaks to provide the synthetic upper band content.
According to further aspects of the invention, a predetermined frequency range of the wideband signal may be selectively boosted. The wideband signal may also be converted to an analog format and amplified.
In accordance with another aspect, the invention provides a system for processing a speech signal. The system comprises means for analyzing a received, narrowband signal to determine synthetic upper band content; means for reproducing a lower band of the speech signal using the received, narrowband signal; and means for combining the reproduced lower band with the determined, synthetic upper band to produce a wideband speech signal having a synthesized component.
According to further aspects of the system, the means for analyzing a received, narrowband signal to determine synthetic upper band content comprises: a parametric spectral analysis module for analyzing the formant structure of the narrowband signal and generating parameters descriptive of the narrow band voice signal and an error signal; a pitch decision module for determining the pitch of the sound segment represented by the narrowband signal; and a residual extender and copy module for processing information derived from the narrowband voice signal and generating a synthetic upper band signal component.
According to additional aspects of the invention, the residual extender and copy module comprises a Fast Fourier Transform module for converting the error signal from the parametric spectral analysis module into the frequency domain; a peak detector for identifying the harmonic frequencies of the error signal; and a copy module for copying the peaks identified by the peak detector into the upper frequency range.
In yet another aspect, the invention provides a system for processing a narrowband speech signal at a receiver. The system includes an upsampler that receives the narrowband speech signal and increases the sampling frequency to generate an output signal having an increased frequency spectrum; a parametric spectral analysis module that receives the output signal from the upsampler and analyzes the output signal to generate parameters associated with a speech model and a residual error signal; a pitch decision module that receives the residual error signal from the parametric spectral analysis module and generates a pitch signal that represents the pitch of the speech signal and an indicator signal that indicates whether the speech signal represents voiced speech or unvoiced speech; and a residual extender and copy module that receives and processes the residual error signal and the pitch signal to generate a synthetic upper band signal component.