The present invention relates to a pitch pattern generation apparatus to define the intonation in a speech synthesizer and the like for converting an input sentence consisting of a character string into synthetic speech.
It is very important in improving quality of speech synthesis to generate natural pitch pattern in a speech synthesizer and the like to convert an input sentence into speech. A conventional manner of pitch pattern generation is to use phrase components gradually descending over the entire speech superimposed with accent components depending on each word. For example, the phrase components are simulated by either a monotonously descending linear pattern or a hill type pattern ascending first and then descending linearly. That is, the accent components are simulated by a broken line. Such prior art is disclosed, for example, in "The Investigation of Prosodic Rules in Connected Speech", The Acoustical Society of Japan; Transactions of the Committee on Speech Research S78-07 (April 1978) (Reference 1).
Such conventional pitch pattern generation technique will be described hereunder by reference to FIG. 3. This is an example of generating a pitch pattern for "He bought a white flower" consisting of 5 words. Represented in FIG. 3(A) are accent components simulated by a broken line having 5 hills. The shape of each hill is determined by the accent type, number of morae, etc. of each word. This accent component (A) is superimposed with the phrase component or the descending linear line as shown in (B) to generate the overall text pitch pattern as shown in (C). L1 through L5 in FIG. 3 are known as stress levels. The relative strength of the stress levels for adjacent words represents the sentence structure and is important to naturalness in the pitch. That is, if connection between two adjacent words is weak, the subsequent word will have a larger stress level than the preceding word. On the contrary, if adjacent two words have stronger connection in meaning, the subsequent word will have a small stress level.
In the conventional pitch pattern generation technique as described in Reference 1 and the like, a number of words between the preceding word and the connection word, which is known as a separation degree, is used as a measure to determine the connection strength of adjacent words. The separation degree is determined by the syntactic structure of a particular sentence. If the separation degree is large at a certain word boundary, the preceding word over the boundary is connected in meaning to a word at more remote location, thereby making the connection with the next subsequent word very weak. On the other hand, if a preceding word is directly connected to the next subsequent word, the separation degree will be the minimum or 1. At a word boundary having a larger separation degree, the stress level for the subsequent word is made larger than that for the preceding word. On the contrary, at word boundary having a smaller separation degree, the subsequent word will have a lower stress level than that of the preceding word.
As described above, the conventional pitch pattern generation technique determines the stress level of each word depending on the strength of connection between adjacent words in the particular structure of the sentence. The accent components determined by the above manner are superimposed with the phrase components, thereby generating the pitch pattern for the entire sentence.
Although the conventional pitch pattern generation technique is based on the premise that the syntactic structure of a sentence can be obtained correctly, it is not always easy to accurately analyze the syntactic structure of a sentence. As a result, the generated pitch pattern is not natural due to errors in the syntactic analysis of a sentence.