Telephone speech transmitted in public wireline and wireless telephone networks is band-limited to 300-3400 Hz. The upper boundary is specified in order to reduce the bandwidth requirements for digitization at 8 kilosamples per second, while retaining sufficient intelligibility, though sacrificing naturalness. In particular, the absence of components in the range above 3400 Hz leads to muffled sounds. This renders it difficult to distinguish between unvoiced phonemes (e.g., /s/ and /f/), whose differentiating components are largely to be found in the missing highband range.
With the rapid evolution of telecommunications technology, devices capable of generating and processing wideband speech (hereinafter, “wideband-capable devices”) have been developed. Wideband speech refers to speech having a large bandwidth (e.g., up to 7000 Hz), which has the advantage of yielding high perceived voice quality. As wideband capable devices enter the marketplace, voice communications increasingly tend to involve such wideband-capable devices. While this allows for very high quality speech communication over private, high-bandwidth networks, the wideband capabilities of wideband-capable devices are largely wasted when the communication involves a public telephone network, since the speech transmitted in such networks is quite severely band-limited.
Nevertheless, the perceived speech quality at a wideband-capable device may be improved by enhancing the band-limited speech with artificially generated spectral content in the highband range. Based on a classical speech production model, artificial generation of the spectral content in the highband range comprises determining certain highband spectral parameters and a highband excitation signal. The highband excitation signal is passed through a linear prediction synthesis filter defined by the highband spectral parameters in order to generate the spectral content in the highband range. The combination of the artificially generated spectral content and the band-limited speech results in semi-artificial wideband speech. The wideband speech so created is considered to be of high quality when it sounds, perceptually, as if it had been issued directly from the source.
Two existing methods of generating the aforesaid highband excitation signal include (i) spectral-folding techniques and (ii) full-wave rectification of prediction residuals. However, these techniques tend to produce unsatisfactory results. For example, it has been found that the use of certain prior art techniques for generating the highband excitation signal cause artifacts in the resulting wideband speech when the band-limited speech contains nasal phonemes (e.g., /n/, /m/).
Against this background, there is a need in the industry for an improved technique of extending the bandwidth of a speech signal.