1. Field of the Invention
The invention relates to a method for synthesising speech using concatenation and, in particular, synthesising voiceless consonants.
2. Discussion of the Background
It is known, in a speech synthesis method, to link together, i.e. concatenate, small sections of sounds which have been recorded by a human speaker. The sounds consist of diphones (i.e. sounds from two phonemes), or polyphones (i.e. a number of phonemes). The advantage of the known method is that the main part of the coarticulation (i.e. common articulation--that part of the pronunciation of a phoneme that is influenced by surrounding phonemes) is located in the area around the phoneme limit, which is included in the recorded sounds, and, as a consequence of this, is reproduced, in a natural human-like manner, in the synthesized speech. The known method also covers the generation of synthetic speech with arbitrary phoneme durations and optional fundamental tone curves, even in those cases where the fundamental tone is in the same register as the person who made the recording from which the speech is synthesised.
In accordance with the known speech synthesis method, the creation of a synthetic waveform is effected by arranging for suitably selected parts of the recorded polyphones to be "out-windowed" with a Hanning-window and copied into suitably selected places in the synthetic waveform. For voiced speech, i.e. voicing sounds, the Hanning-windows are placed in such a manner that the centre of the window is located at the excitation point of a glottis pulse, i.e. at the point in time where the vocal cords are closed.
With unvoiced speech, for example, voiceless consonants, there is no known way of placing the Hanning-windows, for effecting speech synthesis. This problem is, however, generally overcome, in accordance with the known methods, by using a fixed interval between the Hanning-windows. The use of this method, for the synthesis of phonemes of long duration, gives rise to problems, especially in those cases where the synthesised sound needs to be longer than the recorded sound. In such cases, it is necessary to copy the same "out-windowed" signal, in a sequential manner, into a number of suitably selected places in the synthetic waveform. Most people generally have good hearing and are, therefore, able to perceive periodicities, resulting in the synthesised consonants being heard as sounds having a whistling character. If the length of the Hanning-window is larger, a `chuff-chuff`-like sound will be experienced. This problem can be reduced by reversing the content of every second Hanning-window, i.e. by being played back in reverse. However, this will not totally eliminate the problem.