1. Field of the Invention
The present invention relates to a speech synthesis apparatus, and more particularly to an apparatus which performs speech synthesis by rule.
2. Description of the Related Art
Conventionally, in order to perform speech synthesis by rule, control parameters of synthetic speech are produced, and a speech waveform is produced based on the control parameters using an LSP (line spectrum pair) synthesis filter system, a formant synthesis system or a waveform editing system.
Control parameters of synthetic speech are roughly divided into phonological unit information and prosodic information. The phonological unit information is information regarding a list of phonological units used, and the prosodic information is information regarding a pitch pattern representative of intonation and accent and duration lengths representative of rhythm.
For production of phonological unit information and prosodic information, a method is conventionally known and disclosed, for example, in Furui, xe2x80x9cDigital Speech processingxe2x80x9d, p.146, FIGS. 7 and 6 (document 1) wherein phonological unit information and prosodic information are produced separately from each other.
Also another method is known and disclosed in Takahashi et al., xe2x80x9cSpeech Synthesis Software for a Personal Computerxe2x80x9d, Collection of Papers of the 47th National Meeting of the Information Processing Society of Japan, pages 2-377 to 2-378 (document 2) wherein prosodic information is produced first, and then phonological unit information is produced based on the prosodic information. In the method, upon production of the prosodic information, duration lengths are produced first, and then a pitch pattern is produced. However, also an alternative method is known wherein duration lengths and a pitch pattern information are produced independently of each other.
Further, as a method of improving the quality of synthetic speech after prosodic information and phonological unit information are produced, a method is proposed, for example, in Japanese Patent Laid-Open Application No. Hei 4-053998 wherein a signal for improving the quality of speech is generated based on phonological unit parameters.
Conventionally, for control parameters to be used for speech synthesis by rule, meta information such as phonemic representations or devocalization regarding phonological units is used to produce prosodic information, but information of phonological units actually used for synthesis is not used.
Here, for example, in a speech synthesis apparatus which produces a speech waveform using a waveform concatenation method, for each of phonological units actually selected, the time length or the pitch frequency of the original speech is different.
Consequently, there is a problem in that a phonological unit actually used for synthesis is sometimes varied unnecessarily from its phonological unit as collected and this sometimes gives rise to a distortion of the sound on the sense of hearing.
It is an object of the present invention to provide a speech synthesis apparatus which reduces a distortion of synthetic speech.
It is another object of the present invention to provide a speech synthesis apparatus which can produce synthetic speech of a high quality.
In order to attain the objects described above, according to the present invention, upon production of synthetic speech based on prosodic information and phonological unit information, the prosodic information is modified using the phonological unit information. Specifically, duration length information and pitch pattern information and the phonological unit information are modified with each other.
In particular, according to an aspect of the present invention, there is provided a speech synthesis apparatus, comprising prosodic pattern production means for producing a prosodic pattern, phonological unit selection means for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production means, and means for modifying the prosodic pattern based on the selected phonological units.
The speech synthesis apparatus is advantageous in that prosodic information can be modified based on phonological unit information, and consequently, synthetic speech with reduced distortion can be obtained taking environments of phonological units as collected into consideration.
According to another aspect of the present invention, there is provided a speech synthesis apparatus, comprising prosodic pattern production means for producing a prosodic pattern, phonological unit selection means for selecting phonological units based on the prosodic pattern produced by the prosodic pattern production means, and means for feeding back the phonological units selected by the phonological unit selection means to the prosodic pattern production means so that the prosodic pattern and the selected phonological units are modified repetitively.
The speech synthesis apparatus is advantageous in that, since phonological unit information is fed back to repetitively perform modification to it, synthetic speech with further reduced distortion can be obtained.
According to a further aspect of the present invention, there is provided a speech synthesis apparatus, comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern based on the duration lengths produced by the duration length production means, and means for feeding back the pitch pattern to the duration length production means so that the phonological unit duration lengths are modified.
The speech synthesis apparatus is advantageous in that duration lengths of phonological units can be modified based on a pitch pattern and synthetic speech of a high quality can be produced.
According to a still further aspect of the present invention, there is provided a speech synthesis apparatus, comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern, phonological unit selection means for selecting phonological units, first means for supplying the duration lengths produced by the duration length production means to the pitch pattern production means and the phonological unit selection means, second means for supplying the pitch pattern produced by the pitch pattern production means to the duration length production means and the phonological unit selection means, and third means for supplying the phonological units selected by the phonological unit selection means to the pitch pattern production means and the duration length production means, the duration lengths, the pitch pattern and the phonological units being modified by cooperative operations of the duration length production means, the pitch pattern production means and the phonological unit selection means.
The speech synthesis apparatus is advantageous in that modification to duration lengths and a pitch pattern of phonological units and phonological unit information can be performed by referring to them with each other and synthetic speech of a high quality can be produced.
According to a yet further aspect of the present invention, there is provided a speech synthesis apparatus, comprising duration length production means for producing duration lengths of phonological units, pitch pattern production means for producing a pitch pattern, phonological unit selection means for selecting phonological units, and control means for activating the duration length production means, the pitch pattern production means and the phonological unit selection means in this order and controlling the duration length production means, the pitch pattern production means and the phonological unit selection means so that at least one of the duration lengths produced by the duration length production means, the pitch pattern produced by the pitch pattern production means and the phonological units selected by the phonological unit selection means is modified by a corresponding one of the duration length production means, the pitch pattern production means and the phonological unit selection means.
The speech synthesis apparatus is advantageous in that, since modification to duration lengths and a pitch pattern of phonological units and phonological unit information is determined not independently of each other but collectively by the single control means, synthetic speech of a high quality can be produced and the amount of calculation can be reduced.
The speech synthesis apparatus may be constructed such that it further comprises a shared information storage section, and the duration length production means produces duration lengths based on information stored in the shared information storage section and writes the duration length into the shared information storage section, the pitch pattern production section produces a pitch pattern based on the information stored in the shared information storage section and writes the pitch pattern into the shared information storage section, and the phonological unit selection means selects phonological units based on the information stored in the shared information storage section and writes the phonological units into the shared information storage section.
The speech synthesis apparatus is advantageous in that, since information mutually relating to the pertaining means is shared by the pertaining means, reduction of the calculation time can be achieved.
The above and other objects, features and advantages of the present invention will become apparent from the following description and the appended claims, taken in conjunction with the accompanying drawings in which like parts or elements are denoted by like reference symbols.