The present invention relates to a speech information processing method and apparatus for setting a time series fundamental frequency (pitch pattern) in predetermined segment units upon speech synthesis or speech recognition, and a computer-readable storage medium holding a program for execution of the speech processing method.
Recently, a speech synthesis apparatus has been developed so as to convert an arbitrary character string into a phonological series and convert the phonological series into synthesized speech in accordance with predetermined speech synthesis by rule.
However, the synthesized speech outputted from the conventional speech synthesis apparatus sounds unnatural and mechanical in comparison with natural speech sounded by a human being. For example, in a phonological series xe2x80x9co, X, s, e, ixe2x80x9d of a character series xe2x80x9conseixe2x80x9d, the accuracy of prosody generation rules for generating accent and intonation of each phoneme is considered as one of the factors of the awkward-sounding result. If the accuracy is low, as a sufficient pitch pattern cannot be generated for the phonological series, the synthesized speech becomes unnatural and mechanical.
The present invention has been made in consideration of the above prior art, and has as its object to provide a speech information processing method and apparatus for speech synthesis to produce natural intonation by modeling time change in fundamental frequency of a predetermined unit of phoneme.
To attain the foregoing objects, the present invention provides a speech information processing method comprising: an input step of inputting a predetermined unit of phonological series; a generation step of generating fundamental frequencies of respective phonemes constructing the phonological series based on a segment pitch pattern model; and a speech synthesis step of synthesizing speech based on the fundamental frequencies of the respective phonemes generated at the generation step.
Further, the present invention provides a speech information processing apparatus comprising: input means for inputting a predetermined unit of phonological series; generation means for generating fundamental frequencies of respective phonemes constructing the phonological series based on a segment pitch pattern model; and speech synthesis means for synthesizing speech based on the fundamental frequencies of the respective phonemes generated by the generation means.
Further, another object of the present invention is to provide speech information processing method and apparatus for high-accuracy speech recognition using model information obtained by modeling time change in fundamental frequency of phoneme of a predetermined unit.
Further, to attain the foregoing object, the present invention provides a speech information processing method comprising: an input step of inputting speech; an extraction step of extracting a feature parameter of the speech; and a speech recognition step of recognizing the feature parameter based on a segment pitch pattern model.
Further, the present invention provides a speech information processing apparatus comprising: input means for inputting speech; extraction means for extracting a feature parameter of the speech; and speech recognition means for recognizing the feature parameter based on a segment pitch pattern model.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same name or similar parts throughout the figures thereof.