1. Field of the Invention
The present invention relates to a method for determining prosodic markers and a device for implementing the method.
2. Description of the Related Art
In the conditioning of unknown text for speech synthesis in a TTS system (“text to speech” systems) or text/speech conversion systems, an essential step is the conditioning and structuring of the text for the subsequent generation of the prosody. In order to generate prosodic parameters for speech synthesis systems, a two-stage approach is followed. In this case, firstly prosodic markers are generated in the first stage, which markers are then converted into physical parameters in the second stage.
In particular, phrase boundaries and word accents (pitch-accent) may serve as prosodic markers. Phrases are understood to be groupings of words which are generally spoken together within a text, that is to say without intervening pauses in speaking. Pauses in speaking are present only at the respective ends of the phrases, the phrase boundaries. Inserting such pauses at the phrase boundaries of the synthesized speech significantly increases the comprehensibility and naturalness thereof.
In stage 1 of such a two-stage approach, both the stable prediction or determination of phrase boundaries and that of accents pose problems.
A publication entitled “A hierarchical stochastic model for automatic prediction of prosodic boundary location” by M. Ostendorf and N. Veilleux in computational linguistics, 1994, disclosed a method in which “Classification and Regression Trees” (CART) are used for determining phrase boundaries. The initialization of such a method requires a high degree of expert knowledge. In the case of this method, the complexity rises more than proportionally with the accuracy sought.
At the Eurospeech 1997 conference, a method was published entitled “Assigning phase breaks from part-of-speech sequences” by Alan W. Black and Paul Taylor, in which method the phrase boundaries are determined using a “Hidden Markov Model” (HMM). Obtaining a good prediction accuracy for a phrase boundary requires a training text with considerable scope. These training texts are expensive to create, since this necessitates expert knowledge.