1. Field of the Invention
This invention relates to speech synthesis, in particular to the synthesis of wideband speech from a bandlimited speech signal, for example from a speech signal which has been transmitted via the public switched telephone network.
This invention is based on the observation that due to the nature of the vocal tract, there is a correlation between those parts of an original wideband speech signal which are missing from a bandlimited version of that signal and the bandlimited version of that signal. Due to this correlation, speech from within the bandwidth of a bandlimited speech signal can be used to predict the missing original wideband speech signal. The correlation is better for voiced sounds than for unvoiced sounds.
2. Description of Related Art
Known systems for constructing a wideband speech signal from a telephone bandwidth speech signal use a training process to define a transformation whereby an estimate of the missing signal can be generated from a narrowband input signal. In general, a lookup table is constructed during a training phase which defines a correspondence between a representation of a narrowband signal and a representation of the required wideband signal. The lookup table can be used for performing a translation from an actual narrowband spectrum to an estimated wideband spectrum. To generate a wideband speech signal from a narrowband speech signal, received narrowband speech is analysed and the closest representation in the lookup table is identified. The corresponding wideband signal representation is used to synthesise the required wideband signal. The whole of the wideband signal may be synthesised, or the original narrowband signal may be added to a synthesised version of the signal outside the bandwidth of the narrowband signal.
Abe and Yoshida, xe2x80x98Method for reconstructing a wideband speech signalxe2x80x99, Japanese patent application no 6-118995, construct such a lookup table using linear predictive coding (LPC) analysis to characterise the spectrum of wideband training speech. LPC coefficients are extracted from wideband training signals. These wideband LPC coefficients are clustered to form wideband codewords. The wideband training signal is then band-pass filtered to provide a bandlimited signal, the spectrum of which is also characterised using LPC analysis. The narrowband LPC coefficients thus obtained are paired with the corresponding wideband codeword, and for each wideband codeword the set of corresponding narrowband coefficients are averaged to form a narrowband codeword. Thus the narrowband signal and the wideband signal are both represented by a set of LPC coefficients. Synthesis of the wideband signal from the LPC coefficients is performed using conventional techniques. In an alternative system (Abe and Yoshida, xe2x80x98Method for reconstructing a wideband speech signalxe2x80x99, Japanese patent application no 7-56599) the wideband signal is represented by speech waveforms, and synthesis of the wideband signal is achieved by concatenation of speech waveforms.
According to one exemplary aspect of the present invention, an apparatus for synthesising speech from a bandlimited speech signal comprises: means for extracting a spectral signal from the bandlimited signal; peak-picking means arranged to receive said spectral signal and to search a predetermined frequency range to provide a set of one or more peak frequency output values corresponding to the frequency of one or more peaks in said spectral signal; codebook means containing a plurality of codebook entries each codebook entry comprising a set of one or more codebook frequency values and a set of one or more corresponding synthesis parameters; look-up means arranged to receive said peak frequency value set and arranged to access the codebook means to extract a required synthesis parameter set corresponding to a codebook frequency value set which is close to said peak frequency value set; and speech synthesis means arranged to receive the required synthesis parameter set and to generate speech using said required synthesis parameter set.
The codebook synthesis parameter set may contain a synthesis parameter relating to the amplitude of a peak in the spectrum of the synthesised speech, the frequency of the peak being outside the predetermined frequency range.
The codebook synthesis parameter set may contain a synthesis parameter which relates to the frequency of a peak in the spectrum of the synthesised speech, the frequency of the peak being outside the predetermined frequency range.
In a preferred embodiment the peak picking means is capable of recognising more than one peak in said spectral signal and in such an event to provide a set containing a plurality of peak frequency output values, and in which some of the codebook frequency value sets contains a plurality of codebook frequency values.
In a possible embodiment of the present invention a codebook synthesis parameter set contains three synthesis parameters each relating to the amplitude of a high frequency peak in the spectrum of the synthesised speech, the frequency of the high frequency peaks being a higher frequency than the upper band limit of the predetermined frequency range.
In another embodiment of the present invention, codebook synthesis parameter set contains a synthesis parameter relating to the frequency of a low (frequency peak in the spectrum of the synthesised speech, the frequency of the low frequency peak being a lower frequency than the lower band limit of the predetermined frequency range; and a synthesis parameter relating to the amplitude of low frequency peak.
Additionally a pitch extracting means may be connected to receive the bandlimited speech signal and in the event that the spectral signal represents voiced speech to provide a pitch frequency value corresponding to the pitch of the received bandlimited speech signal. Some of the codebook frequency value sets contain a frequency value relating to pitch. In the event that the spectral signal represents voiced speech, the lookup means may be arranged to extract a required synthesis parameter set corresponding to a codebook frequency value set which is also close to said pitch frequency value.
Corresponding methods are also provided by this invention.
In the present invention a peak picker 2 is used to provide estimates of formant frequencies. Due to the nature of the vocal tract constraints due to the shape of the vocal and nasal cavities and constraints due to the physical limitations of the muscles mean that the frequency of formants give a good indication, for voiced sounds, as to the shape of the vocal tract. Hence, for voiced sounds, formants within the known narrowband speech signal are a good indicator of the position of any formants outside the bandwidth of the narrowband speech signal.