In a case where voice data is stored by receiving a natural voice, generally a voice tone waveform is stored as it is as voice data.
However, a voice waveform necessitates a data rate, and as the number of files becomes larger, a larger memory space is required, and also a longer time is required for transferring the files.
For the circumstances as described above, in recent years, as disclosed in Japanese Patent Publication No. HEI 5-52520, there has been proposed an apparatus for synthesizing a voice waveform by decoding voice source data obtained by encoding (compressing) a voice waveform when a voice is synthesized and synthesizing a voice waveform using voice route data in a phoneme memory. In this publication, a voice is divided into several time zones, and voice source data for pitch and power (amplitude of a voice) are specified with an absolute amplitude level at every frame of the divided time zone. Namely, a plurality of frames of voice source data are correlated to each phoneme.
Also, as a technology analogous to that disclosed in the publication described above, there is the invention disclosed in Japanese Patent Laid-Open Publication No. SHO 60-216395. With the invention disclosed in this publication, a data form is employed in which one of representative voice source data is obtained from a plurality of frames each corresponding to each phoneme, and representative voice source data is correlated to each phoneme.
It is possible to reduce a data rate by coding data as disclosed in Japanese Patent Publication No. HEI 5-52520 described above, but as a plurality of frames can be correlated to a time zone for one phoneme, it is possible to obtain continuity in data in the direction of a time axis, but further reduction of data rate is required.
So for correlating representative voice source data to each phoneme as disclosed in Japanese Patent Laid-Open Publication No. SHO 60-216395, a data format more discrete as compared to continuity of voice source data according to Japanese Patent Publication No. HEI 5-52520 has been employed, and this method is effective for reducing a data rate.
However, such parameters as a local change pattern of amplitude in a shifting section from a consonant to a vowel or a ratio between levels of amplitude of each vowel are independent and substantially fixed for each voice route data.
For this reason, in the technology disclosed in Japanese Patent Laid-open Publication No. SHO 60-216395, there occurs no problems in reproducibility of voice tone so far as a narrator giving basic voice route data is the same person as a person giving the voice-generating data, and at the same time so far as voice conditions for making the voice route data are the same as those for making the voice source data. However, if the persons and the conditions are different, the original amplitude patterns of the voice route data are not reflected because the amplitude is specified as an absolute amplitude level and also because the voice pitch is specified as an absolute pitch frequency. Thus, there is the possibility that the voice is reproduced with an inappropriate voice tone.
In addition, as a voice pitch pattern is apt to be delayed as compared to a syllable, generally a position of a local maximum value or a minimum value of voice pitch is displaced from a separating position between phonemes. For this reason, there is the disadvantageous possibility that a voice pitch pattern can not be approximated well when a voice is synthesized. Also in this case, the voice may be reproduced with inappropriate voice tone.
As described above, in Japanese Patent Laid-open Publication No. SHO 60-216395, since voice source data depends on particular voice route data in a phoneme memory, voice route data for different voice tones can not be used.