1. Field of the Invention
The present invention relates to a method and apparatus for converting text to speech.
2. Related Art
Although text-to-speech conversion apparatus has improved markedly over recent years, the sound of such apparatus reading a piece of text is still distinguishable from the sound of a human reading the same text. One reason for this is that text-to-speech converters occasionally apply phrasing that differs from that which would be applied by a human reader. This makes speech synthesised from text more onerous to listen to than speech read by a human.
The development of methods for predicting the phrasing for an input sentence has, thus far, largely mirrored developments in language processing. Initially, automatic language processing was not available, so early text-to-speech converters relied on punctuation for predicting phrasing. It was found that punctuation only represented the most significant boundaries between phrases, and often did not indicate how the boundary was to be conveyed acoustically. Hence, although this method was simple and reasonably effective, there was still room for improvement. Thereafter, as automatic language processing developed, lexicons which indicated the part-of-speech associated with each word in the input text were used. Associating part-of-speech tags with words in the text increased the complexity of the apparatus without offering a concomitant improvement in the prediction of phrasing. More recently, the possibility of using rules to predict phrase boundaries from the length and syntactic structure of the sentence has been discussed (Bachenko J and Fitzpatrick E: ‘A computational grammar of discourse-neutral prosodic phrasing in English’, Computational Linguistics, vol. 16, No. 3, pp 155–170 (1990)). Others have proposed deriving statistical parameters from a database of sentences which have natural prosodic phrase boundaries marked (Wang, M. and Hirschberg J: ‘Predicting intonational boundaries automatically from text: the ATIS domain’, Proc. of the DARPA Speech and Natural Language Workshop, pp 378–383 (February 1991)). These recent approaches to the prediction of phrasing still do not provide entirely satisfactory results.