1. Field of the Invention
The present invention relates generally to a method of changing a pitch of a VCV (vowel-consonant-vowel) phoneme-chain waveform and an apparatus of synthesizing a sound by changing pitches of a plurality of VCV phoneme-chain waveforms and connecting the VCV phoneme-chain waveforms with each other, and more particularly to a pitch changing method in which a pitch of a VCV phoneme-chain waveform is changed while the VCV phoneme-chain waveform maintains a pitch fluctuation and a pitch fine structure and a sound synthesizing apparatus in which a sound is synthesized from a-series of VCV phoneme-chain waveforms while the VCV phoneme-chain waveforms of the sound maintain a pitch fluctuation and a pitch fine structure.
2. Description of the Related Art
2.1 Previously Proposed Art
FIG. 1 shows a composite pitch pattern P1 of a waveform of a phrase "Yokohama city" pronounced as "yo-ko-ha-ma-shi" in Japan, and FIGS. 2A to 2D show pitch patterns P2 to P5 of waveforms of a plurality of VCV (vowel-consonant-vowel) phoneme chains "(y)-o-k-o", "o-h-a", "a-m-a" and "a-sh-i" obtained by dividing a series of phonemes of the pronounced voice "yo-ko-ha-ma-shi".
When a plurality of characters "yokohamashi" written in a text is read in a conventional voice synthesizing apparatus, a character signal waveform indicating the pronunciation "yo-ko-ha-ma-shi" is artificially generated, the composite pitch pattern P1 of the waveform corresponding to the pronunciation "yo-ko-ha-ma-shi" is produced from the character signal waveform. Also, a large number of VCV phoneme-chain waveforms respectively extracted from an actual voice are stored in advance in a VCV phoneme-chain waveform storing unit of the conventional voice synthesizing apparatus, and waveforms inherent in a plurality of VCV phoneme chains "(y)-o-k-o", "o-h-a", "a-m-a" and "a-sh-i" corresponding to the input characters "yokohamashi" are read out from the storing unit. Here, a pitch frequency of one pitch pattern denotes a fundamental frequency of a sound including a voice. When the pitch frequency is high (or low), the sound is classified as a high-pitched (or low-pitched) sound. Also, a portion of the pitch pattern indicated by a dotted line in each of the pitched patterns P2, P3 and P5 indicates a waveform of a voiceless consonant such as "k" or "h". Also, a first portion P6 of the first phoneme "o" in the VCV phoneme-chain waveform "(y)-o-k-o" indicates a vowel transitional portion of the first phoneme "o", a second portion P7 of the second phoneme "o" in the VCV phoneme-chain waveforms "(y)-o-k-o" and "o-h-a" indicates a vowel transitional portion of the second phoneme "o", a portion P8 of the phoneme "a" all in the VCV phoneme-chain waveforms "o-h-a" and "a-m-a" indicates a vowel transitional portion of the phoneme "a", and a portion P9 of the phoneme "a" common in the VCV phoneme-chain waveforms "a-m-a" and "a-sh-i" indicates a vowel transitional portion of the phoneme "a".
In a conventional voice synthesizing method, because a pitch frequency at each vowel transitional portion is gradually changed, each pair of VCV phoneme-chain waveforms adjacent to each other are connected with each other at vowel transitional portions of a common vowel on condition that the common vowel is not either a vowel placed at the top of a word or a voiceless vowel, and a synthesized pitch pattern almost agreeing with the composite pitch pattern P1 is formed by connecting the pitch patterns P2 to P5 with each other while adjusting the pitch frequency of each pitch pattern P2 to P5.
The pitch pattern connection performed while adjusting the pitch frequency of each pitch pattern is described in detail with reference to FIGS. 3A and 3B.
FIG. 3A representatively shows a VCV phoneme-chain waveform placed in a plurality of time-periods.
As shown in FIG. 3A, in cases where a pitch pattern of the waveform of the pronunciation "yo-ko-ha-ma-shi" is, for example, synthesized, a plurality of impulse actuating time-points Pt are determined at a plurality of local peak points of one VCV phoneme-chain waveform for each of the VCV phoneme-chain waveforms "(y)-o-k-o", "o-h-a", "a-m-a" and "a-sh-i", a pair of time-periods adjacent to each other is determined for each impulse actuating time-point Pt, a pitch waveform is extracted from a waveform portion at one pair of time-periods around one impulse actuating time-point Pt for each impulse actuating time-point Pt by setting a hunning window to the waveform portion to decompose each VCV phoneme-chain waveform to a series of pitch waveforms (called a pitch waveform string). A representative pitch waveform is shown in FIG. 3B. Thereafter, the pitch waveform string of the VCV phoneme-chain waveform "(y)-o-k-o", the pitch waveform string of the VCV phoneme-chain waveform "o-h-a", the pitch waveform string of the VCV phoneme-chain waveform "a-m-a" and the pitch waveform string of the VCV phoneme-chain waveform "a-sh-i" are connected with each other in that order to arrange the pitch waveforms of the VCV phoneme-chain waveforms along the composite pitch pattern P1 while the vowel transitional portions P7 of the waveforms "(y)-o-k-o" and "o-h-a", the vowel transitional portions P8 of the waveforms "o-h-a" and "a-m-a" and the vowel transitional portions P9 of the waveforms "a-m-a" and "a-sh-i" are respectively overlapped. In this case, because a time interval between two pitch waveforms corresponds to a pitch frequency, the arrangement of the pitch waveforms of the VCV phoneme-chain waveforms along the composite pitch pattern P1 denotes that the time intervals of the pitch waveforms of the VCV phoneme-chain waveforms are adjusted to the pitch frequency of the composite pitch pattern P1. That is, a pitch of each VCV phoneme-chain waveform is changed to adjust a pitch frequency of each VCV phoneme-chain waveform to a pitch frequency of the composite pitch pattern P1.
2.2. Problems to be Solved by the Invention
However, in the above pitch changing method for the VCV phoneme-chain waveforms, because each VCV phoneme-chain waveform is decomposed to a plurality of pitch waveforms and the pitch waveforms are rearranged along the composite pitch pattern P1, a pitch fluctuation peculiar to a natural voice is disappeared. Here, the pitch fluctuation denotes a minute time fluctuation in a pitch frequency of a pitch pattern. For example, a time interval of two impulse actuation time-points adjacent to each other slightly changes with time in each VCV phoneme-chain waveform, and the slight change of the time interval between the impulse actuation time-points is lost by rearranging the pitch waveforms. Therefore, there is a drawback that the natural quality of a synthesized voice obtained in the conventional voice synthesizing apparatus is degraded.
Also, there is a case that a pitch frequency of a voiced consonant portion becomes slightly lower than that of a vowel portion in a VCV phoneme chain. For example, as shown in FIG. 1, a pitch frequency of the voiced consonant "m" in the pitch patter P4 is lower than that of the vowel "a". This pitch frequency change in a structure of a voice waveform is called a pitch fine structure. However, because the composite pitch pattern 1 is artificially generated, any pitch fine structure does not exist in the composite pitch pattern 1. Therefore, the composite pitch pattern 1 is called a general whole pitch pattern having no pitch fluctuation or no pitch fine structure. For example, a pitch frequency of the voiced consonant "m" is not lower than that of the vowel "a" in the composite pitch patter P1. Therefore, even though a pitch pattern of each VCV phoneme-chain waveform has a pitch fine structure, because each VCV phoneme-chain waveform is decomposed to a plurality of pitch waveforms and the pitch waveforms are rearranged along the composite pitch pattern P1, there is a drawback that the pitch fine structure is disappeared.
Also, though people can feel that a sound is high or low according to the fundamental frequency (or the pitch frequency) of the sound, people cannot feel a tone quality according to the pitch frequency. That is, the tone quality of a sound depends on a distribution of a plurality of higher harmonic waves included in the sound. In cases where the pitch frequency of a VCV phoneme-chain waveform is greatly changed to arrange the VCV phoneme-chain waveform along the composite pitch pattern P1, in other words, in cases where a pitch changing degree indicating a ratio of the pitch frequency of the composite pitch pattern P1 to the pitch frequency of the VCV phoneme-chain waveform is high, a balance between a wave of the fundamental frequency and the group of higher harmonic waves is greatly changed. Therefore, there is a drawback that the natural quality of a synthesized voice is lost and the tone quality of the synthesized voice is degraded.