1. Field of the Invention
The present invention relates to a fundamental frequency pattern generating method used in speech synthesis.
2. Description of the Related Art
A conventional fundamental frequency pattern generating method is such that, paying attention to the accent type, the fundamental frequency pattern is decided by the critical damping quadratic linear system on the logarithmic frequency axis with the start point or the vowel start point of the mora concerned as the reference like Japanese Laid-open Patent Application Hei5-173590. Another conventional method is such that the fundamental frequency of each mora is decided with attention paid to the accent type, the kind of the phonological segment and the mora position of the word or the phrase like Japanese Laid-open Patent Application Hei5-88690.
According to these methods, however, it is impossible to accurately decide variation in fundamental frequency in a mora, or distortion is caused on the real time axis due to the difference in time length among morae, so that the rhythm typified by the accent becomes unnatural.
The present invention is intended to solve the above-mentioned problem of the conventional fundamental speech frequency pattern generating methods.
An aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency of an accent phrase,
wherein a fundamental frequency data base is referred to that stores (1) a fundamental frequency pattern obtained by standardizing a fundamental frequency pattern of at least one of the following phonological segments by a time length of the phonological segment: a first phonological segment of the accent phrase; a phonological segment where the fundamental frequency takes a maximum value; a phonological segment of an accent nucleus and a phonological segment next to the accent nucleus; and one phonological segment at an end, or (2) a fundamental frequency pattern obtained by standardizing a fundamental frequency pattern of a phoneme included in at least one of said phonological segments by a time length of the phoneme,
wherein (3) fundamental frequency patterns of all or part of the following phonological segments: the first phonological segment of the accent phrase for which the fundamental frequency is to be generated; the phonological segment where the fundamental frequency takes the maximum value in the accent phrase; the phonological segment of the accent nucleus and the phonological segment next to the accent nucleus in the accent phrase; and the phonological segment of the end of the accent phrase, or (4) a fundamental frequency pattern of each phoneme included in said phonological segments is set, and
wherein a fundamental frequency pattern between the phonological segments or between the phonemes which fundamental frequency pattern has not been set in a stage of the fundamental frequency pattern setting is interpolated by a function on a real time axis.
Another aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency of an accent phrase,
wherein all or part of a rise reference point of the accent phrase for which the fundamental frequency is to be generated, a fall reference point generating an accent, an accent phrase end reference point deciding fundamental frequency patterns of a plurality of phonological segments including any of one phonological segment at an end of the accent phrase, and a word end reference point generating a fundamental frequency pattern of a word end are set on a time axis standardized by a time length of a phoneme included in each phonological segment,
wherein a fundamental frequency data base is referred to that stores, of fundamental frequencies extracted from fundamental frequency patterns obtained by standardizing the fundamental frequency patterns of the phonemes included in the phonological segments by time lengths of the phonemes, a fundamental frequency pattern of at least one of the rise reference point of the accent phrase, the fall reference point, the accent phrase end reference point and the word end reference point,
wherein a fundamental frequency at the set reference point is set with reference to the fundamental frequency data base, and
wherein a fundamental frequency between the reference points which fundamental frequency has not been set in a stage of the fundamental frequency setting is interpolated by a function on a real time axis or by a fundamental frequency pattern plotted on the real time axis.
Still another aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency of an accent phrase,
wherein a fundamental frequency data base is referred to that stores a fundamental frequency pattern obtained by standardizing a fundamental frequency pattern corresponding to a vowel portion included in at least one of the following phonological segments by a time length of the vowel included in the phonological segment: a first phonological segment of the accent phrase; a phonological segment where the fundamental frequency takes a maximum value; a phonological segment of an accent nucleus and a phonological segment next to the accent nucleus; and one phonological segment at an end or a plurality of phonological segments which are four or less phonological segments from the end,
wherein in all or part of the following phonological segments: the first phonological segment of the accent phrase for which the fundamental frequency is to be generated; the phonological segment where the fundamental frequency is the maximum value in the accent phrase; the phonological segment of the accent nucleus and the phonological segment next to the accent nucleus in the accent phrase; and the phonological segment of the end of the accent phrase, a fundamental frequency pattern for each vowel included in the phonological segments is set, and
wherein a fundamental frequency between the phonological segments for which the fundamental frequency pattern setting is not performed is interpolated by a function on a real time axis.
Still yet another aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency of an accent phrase,
wherein all or part of a rise reference point of the accent phrase for which the fundamental frequency is to be generated, a fall reference point generating an accent, an accent phrase end reference point deciding a fundamental frequency pattern of an end of the accent phrase, and a word end reference point generating a fundamental frequency pattern of a word end are set on a time axis standardized by a time length of a phoneme included in each phonological segment,
wherein a fundamental frequency data base is referred to that stores, of fundamental frequencies extracted from fundamental frequency patterns obtained by standardizing fundamental frequency patterns of vowels included in the phonological segments by time lengths of the vowels, a fundamental frequency of at least one of the rise reference point of the accent phrase, the fall reference point, the accent phrase end reference point and the word end reference point,
wherein a fundamental frequency at the set reference point is set with reference to the fundamental frequency data base, and
wherein a fundamental frequency between the reference points for which the fundamental frequency setting is not performed is interpolated by a function on a real time axis or by a fundamental frequency pattern plotted on the real time axis.
A further aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency of an accent phrase,
wherein a fundamental frequency pattern of each accent phrase is set with reference to a fundamental frequency data base that stores a fundamental frequency pattern standardized by a time length of each phoneme included in a phonological segment classified according to one or both of the number of phonological segments and an accent position, and
wherein a value corresponding to a phoneme or a phonological segment string for which the fundamental frequency is to be generated is obtained from a microprosody data base that stores a difference between a fundamental frequency of each phonological segment or each phoneme string standardized by a time length of the phoneme and said fundamental frequency pattern which difference is classified according to a phonological segment or a phoneme string, and the corresponding value is added to the set fundamental frequency or subtracted from the set fundamental frequency to thereby generate the fundamental frequency of the accent phrase.
An aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency pattern of an accent phrase by use of a fundamental frequency data base storing a fundamental frequency pattern classified according to the number of phonological segments and an accent position,
wherein when a fundamental frequency pattern corresponding to the number of phonological segments and an accent pattern of the accent phrase for which the fundamental frequency pattern is to be generated is not stored in the fundamental frequency data base and an accent position of the accent phrase for which the fundamental frequency is to be generated is the same or before a phonological segment position next to a phonological segment position including a peak of the fundamental frequency stored in the fundamental frequency data base,
(1) the fundamental frequency pattern stored in the fundamental frequency data base is used which has an accent position the same as the accent position of the accent phrase for which the fundamental frequency pattern is to be generated, said fundamental frequency pattern stored in the fundamental frequency data base corresponding to the number of phonological segments closest to the number of phonological segments of the accent phrase for which the fundamental frequency pattern is to be generated,
(2) a fundamental frequency pattern from a first phonological segment to a phonological segment next to an accent nucleus is generated by applying a fundamental frequency from a first phonological segment to a phonological segment next to an accent nucleus of a fundamental frequency pattern stored in the fundamental frequency data base,
(3) a fundamental frequency from a second phonological segment from the accent nucleus to a phonological segment immediately before an end of the accent phrase including predetermined four or less number of phonological segments is generated by performing interpolation by (a) fundamental frequencies of the second phonological segment from the accent nucleus and the end of the accent phrase or (b) fundamental frequencies of the phonological segment next to the accent nucleus and the end of the accent phrase or (c) fundamental frequencies of the second phonological segment from the accent nucleus and the phonological segment immediately before the end of the accent phrase or (d) fundamental frequencies of the phonological segment next to the accent nucleus and the phonological segment immediately before the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency data base, and
(4) a fundamental frequency of the end of the accent phrase for which the fundamental frequency pattern is to be generated is generated by applying a fundamental frequency of the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency data base.
Another aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency pattern of an accent phrase by use of a fundamental frequency data base storing a fundamental frequency pattern classified according to the number of phonological segments and an accent position,
wherein when a fundamental frequency pattern corresponding to the number of phonological segments and an accent pattern of the accent phrase for which the fundamental frequency pattern is to be generated is not stored in the fundamental frequency data base and an accent position of the accent phrase for which the fundamental frequency pattern is to be generated is after a phonological segment position next to a phonological segment position including a peak of the fundamental frequency stored in the fundamental frequency data base and before an end of the predetermined accent phrase,
(1) a fundamental frequency pattern stored in the fundamental frequency data base is used which has an accent nucleus at a second phonological segment from the peak of the fundamental frequency stored in the fundamental frequency data base or at a phonological segment thereafter and before the end of the accent phrase, said fundamental frequency pattern stored in the fundamental frequency data base corresponding to the number of phonological segments closest to the number of the phonological segments of the accent phrase for which the fundamental frequency is to be generated,
(2) a fundamental frequency pattern from a first phonological segment of the accent phrase for which the fundamental frequency is to be generated to the phonological segment including the peak of the fundamental frequency is generated by applying a fundamental frequency from a first phonological segment of the fundamental frequency pattern stored in the fundamental frequency data base to the phonological segment including the peak of the fundamental frequency,
(3) a fundamental frequency from the phonological segment next to the phonological segment including the peak of the fundamental frequency to a phonological segment immediately before the accent nucleus is generated by performing interpolation by (a) fundamental frequencies of the phonological segment including the. peak of the fundamental frequency and a phonological segment including the accent nucleus or (b) fundamental frequencies of the phonological segment including the peak of the fundamental frequency and the fundamental frequency immediately before the phonological segment including the accent nucleus or (c) fundamental frequencies of the phonological segment next to the phonological segment including the peak of the fundamental frequency and the phonological segment including the accent nucleus or (d) fundamental frequencies of the phonological segment next to the phonological segment including the peak of the fundamental frequency and the phonological segment immediately before the phonological segment including the accent nucleus of the fundamental frequency pattern stored in the fundamental frequency data base,
(4) fundamental frequencies of the phonological segment including the accent nucleus of the accent phrase for which the fundamental frequency is to be generated and a phonological segment immediately thereafter are generated by applying fundamental frequencies of the phonological segment including the accent nucleus and a phonological segment immediately thereafter of the fundamental frequency pattern stored in the fundamental frequency data base,
(5) a fundamental frequency from a second phonological segment from the accent nucleus to a phonological segment immediately before an end of the accent phrase including predetermined four or less number of phonological segments is generated by performing interpolation by (a) fundamental frequencies of the second phonological segment from the accent nucleus and the end of the accent phrase or (b) fundamental frequencies of the phonological segment next to the accent nucleus and the end of the accent phrase or (c) fundamental frequencies of the second phonological segment from the accent nucleus and the phonological segment immediately before the end of the accent phrase or (d) fundamental frequencies of the phonological segment next to the accent nucleus and the phonological segment immediately before the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency data base, and
(6) a fundamental frequency pattern of the end of the accent phrase for which the fundamental frequency is to be generated is generated by applying a fundamental frequency of the phonological segment of the end of the accent phrase of the fundamental frequency pattern stored in the fundamental frequency data base.
Still another aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency pattern of an accent phrase by use of a fundamental frequency data base storing a fundamental frequency pattern classified according to the number of phonological segments and an accent position,
wherein when a fundamental frequency pattern corresponding to the number of phonological segments and an accent pattern of the accent phrase for which the fundamental frequency pattern is to be generated is not stored in the fundamental frequency data base and an accent position of the accent phrase for which the fundamental frequency is to be generated is included in a phonological segment of an end of the accent phrase,
(1) the fundamental frequency pattern stored in the fundamental frequency data base is used in which the accent position in the end of the accent phrase of the accent phrase for which the fundamental frequency is to be generated and the accent position in the end of the accent phrase are the same, said fundamental frequency pattern stored in the fundamental frequency data base corresponding to the number of phonological segments closest to the number of phonological segments of the accent phrase for which the fundamental frequency is to be generated,
(2) a fundamental frequency pattern from a first phonological segment of the accent phrase for which the fundamental frequency is to be generated to a phonological segment including a peak of the fundamental frequency is generated by applying a fundamental frequency from a first phonological segment of the fundamental frequency pattern stored in the fundamental frequency data base to a phonological segment including a peak of the fundamental frequency,
(3) a fundamental frequency from a phonological segment next to the phonological segment including the peak of the fundamental frequency to a phonological segment immediately before an accent nucleus is generated by performing interpolation by (a) fundamental frequencies of the phonological segment including the peak of the fundamental frequency and a phonological segment including the accent nucleus or (b) fundamental frequencies of the phonological segment including the peak of the fundamental frequency and the phonological segment immediately before the phonological segment including the accent nucleus or (c) fundamental frequencies of a phonological segment next to the phonological segment including the peak of the fundamental frequency and the phonological segment including the accent nucleus or (d) fundamental frequencies of the phonological segment next to the phonological segment including the peak of the fundamental frequency and the phonological segment immediately before the phonological segment including the accent nucleus of the fundamental frequency pattern stored in the fundamental frequency data base, and
(4) a fundamental frequency from a phonological segment including an accent nucleus of the accent phrase for which the fundamental frequency is to be generated to a last phonological segment of the accent phrase is generated by applying a fundamental frequency from the phonological segment including the accent nucleus of the fundamental frequency pattern stored in the fundamental data base to a last phonological segment of the accent phrase.
Still yet another aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency pattern of an accent phrase by use of a fundamental frequency data base storing a fundamental frequency pattern classified according to the number of phonological segments and an accent position,
wherein when a fundamental frequency pattern corresponding to the number of phonological segments and an accent pattern of the accent phrase for which the fundamental frequency pattern is to be generated is not stored in the fundamental frequency data base and an accent type of the accent phrase for which the fundamental frequency is to be generated is a flat type,
(1) a fundamental frequency pattern stored in the fundamental frequency data base is used which corresponds to the number of phonological segments closest to the number of phonological segments of the accent phrase of the flat type for which the fundamental frequency is to be generated,
(2) a fundamental frequency pattern from a first phonological segment to a phonological segment including a peak of a fundamental frequency is generated by applying a fundamental frequency from a first phonological segment of the fundamental frequency pattern stored in the fundamental frequency data base to a phonological segment including a peak of the fundamental frequency,
(3) a fundamental frequency from a phonological segment next to the phonological segment including the peak of the fundamental frequency to a phonological segment of an end of the accent phrase or immediately before a last phonological segment is generated by performing interpolation by (a) fundamental frequencies of the phonological segment including the peak of the fundamental frequency and the end of the accent phrase or the last phonological segment or (b) fundamental frequencies of the phonological segment including the peak of the fundamental frequency and the phonological segment of the end of the accent phrase or immediately before the last phonological segment or (c) fundamental frequencies of the phonological segment next to the phonological segment including the peak of the fundamental frequency and the end of the accent phrase or the last phonological segment or (d) fundamental frequencies of the phonological segment next to the phonological segment including the peak of the fundamental frequency and the phonological segment of the end of the accent phrase or immediately before the last phonological segment of the fundamental frequency pattern stored in the fundamental frequency data base, and
(4) a fundamental frequency pattern of an accent phrase end or a last phonological segment of the accent phrase for which the fundamental frequency is to be generated is generated by applying a fundamental frequency of the phonological segment of the end of the accent phrase or the last phonological segment of the fundamental frequency pattern stored in the fundamental frequency data base.
A further aspect of the present invention is a fundamental frequency pattern generating method using a fundamental frequency data base storing a fundamental frequency pattern of an accent phrase, said fundamental frequency pattern being classified according to a position of the accent phrase in a sentence phrase and whether the accent phrase is situated at an end of a sentence or not.
An aspect of the present invention is a fundamental frequency pattern generating method using a fundamental frequency data base that stores a fundamental frequency pattern of an accent phrase, and using a variation data base that stores a fundamental frequency pattern variation amount for changing one or a plurality of the following characteristics: a start point; a peak; a minimum value; an accent nucleus; an accent fall; an accent phrase end; an end point; and a dynamic range of the fundamental frequency pattern stored in the fundamental frequency data base according to a position, in a sentence phrase, of the accent phrase for which the fundamental frequency is to be generated.
Another aspect of the present invention is a fundamental frequency pattern generating method wherein when a fundamental frequency pattern of a sentence phrase formed by connecting a plurality of accent phrases is generated, one or a plurality of the following characteristics:
a start point; a peak; an accent nucleus; an accent fall; an accent phrase end; and an end point of a fundamental frequency pattern stored in a fundamental frequency data base that stores a fundamental frequency pattern of the accent phrase and obtained from the fundamental frequency data base are changed by use of a predetermined rule based on a position of the accent phrase in the sentence phrase.
Still another aspect of the present invention is a fundamental frequency pattern generating method wherein when a fundamental frequency pattern of a sentence phrase formed by connecting a plurality of accent phrases is generated, one or a plurality of the following characteristics:
a start point; a peak; an accent nucleus; an accent fall; an accent phrase end; and an end point of a fundamental frequency pattern obtained from a fundamental frequency data base that stores a fundamental frequency pattern of the accent phrase are changed by use of a predetermined rule based on the number of phonological segments from a predetermined position of the sentence phrase to a phonological segment immediately before a phonological segment including the characteristic for which the fundamental frequency is to be generated.
Still yet another aspect of the present invention is a fundamental frequency pattern generating method for generating a fundamental frequency pattern for each accent phrase,
wherein by changing one or a plurality of the following characteristics: an accent fall; an accent phrase end; and an end point of the accent phrase for which the fundamental frequency pattern is to be generated, a difference between fundamental frequencies of the accent phrase end and the end point of the accent phrase and a fundamental frequency of a start point of an accent phrase next to the accent phrase is not more than a predetermined threshold value.
A further aspect of the present invention is a fundamental frequency pattern generator for generating a fundamental frequency of an accent phrase comprising:
a fundamental frequency data base storing (1) a fundamental frequency pattern obtained by standardizing a fundamental frequency pattern of at least one of the following phonological segments by a time length of the phonological segment: a first phonological segment of the accent phrase; a phonological segment where the fundamental frequency takes a maximum value; a phonological segment of an accent nucleus and a phonological segment next to the accent nucleus; and one phonological segment at an end, or (2) a fundamental frequency pattern obtained by standardizing a fundamental frequency pattern of a phoneme included in at least one of said phonological segments by a time length of the phoneme; and
a fundamental frequency pattern generating portion for setting (3) fundamental frequency patterns of all or part of the following phonological segments: the first phonological segment of the accent phrase for which the fundamental frequency is to be generated; the phonological segment where the fundamental frequency takes the maximum value in the accent phrase; the phonological segment of the accent nucleus and the phonological segment next to the accent nucleus in the accent phrase; and the phonological segment of the end of the accent phrase, or (4) a fundamental frequency pattern of each phoneme included in said phonological segments with reference to the fundamental frequency data base, said fundamental frequency pattern generating portion interpolating by a function on a real time axis a fundamental frequency pattern between the phonological segments or between the phonemes which fundamental frequency pattern has not been set in a stage of the fundamental frequency pattern setting.
A further aspect of the present invention is a fundamental frequency pattern generator for generating a fundamental frequency of an accent phrase comprising:
a fundamental frequency data base storing a fundamental frequency pattern standardized by a time length of each phoneme included in a phonological segment classified according to one or both of the number of phonological segments and an accent position;
a microprosody data base storing a difference between a fundamental frequency of each phonological segment or each phoneme string standardized by a time length of the phoneme and the frequency pattern, said difference being classified according to a phonological segment or a phoneme string; and
a fundamental frequency pattern generating portion for generating the fundamental frequency of the accent phrase by setting a fundamental frequency pattern of each accent phrase with reference to the fundamental frequency data base, obtaining a value corresponding to a phoneme or a phonological segment string for which the fundamental frequency is to be generated, and adding the corresponding value to the set fundamental frequency or subtracting the corresponding value from the set fundamental frequency.
Another aspect of the present invention is a fundamental frequency pattern generator comprising:
an accent phrase position fundamental frequency data base storing a fundamental frequency pattern of an accent phrase, said fundamental frequency pattern being classified according to a position of the accent phrase in a sentence phrase formed by connecting a plurality of accent phrases, and to whether the accent phrase is situated at an end of a sentence or not; and
a fundamental frequency pattern generating portion for setting fundamental frequency patterns of the accent phrases constituting the sentence phrase with reference to the accent phrase position fundamental frequency data base.