1. Field of the Invention
The present invention relates to a method for analyzing, storing and synthesizing voice sound information and an apparatus embodying such a method.
2. Description of the Related Art
Hitherto, there has been developed a speech synthesis-by-rule method for generating voice sounds from character string data. In this method, feature parameters, such as LPC, PARCOR, LSP, or MEL CEPSTRUM (these will be hereinafter referred to simply as parameters), of phonemes stored in a phoneme file are read out in accordance with information on character string data. The feature parameters and driving sound source signals (i.e., an impulse series in voiced sound sections, and noise in unvoiced sound sections) are expanded or compressed on the basis of a fixed rule according to the rate at which voice sounds are synthesized. By supplying these signals to a speech synthesizer, a synthesized voice is obtained.
CV (consonant-vowel) phonemes, CVC (consonant-vowel-consonant) phonemes, and VCV (vowel-consonant-vowel) phonemes are commonly used as the form of phonemes for producing a synthesized voice. In particular, when long-unit phonemes, such as CVC phonemes or VCV phonemes, are used, large amounts of memory for storing phonemes are required. For this reason, a vector quantization method is effective for efficiently managing phoneme parameters.
In the vector quantization method, patterns of various parameters are previously determined by using a clustering technique, and codes are assigned to them. A table showing the correspondence between these codes and patterns is called a code book. A parameter is determined for each frame for an input voice sound. This parameter is compared with each pattern which has been previously determined, and the parameter is represented for the section of the frame to be expressed, by a code having the highest similarity thereto. The use of this vector quantization method enables various voice sounds to be expressed by using a limited number of patterns, thus making it possible to efficiently compress data.
However, in the conventional vector quantization method, since quantization is performed by using all dimensions of parameters, patterns are produced in such a manner that minute data characteristics for each dimension are ignored.
Parameters include power information about the intensity of a voice sound and spectrum information about acoustic information of a voice sound. Essentially, these two types of information are completely independent of each other and should be treated separately. However, in the prior art, these two types of information are treated collectively as one vector without any differentiating being made between them, and patterns are produced on this basis. In such a conventional method, when, for example, the power of a voice sound varies, even if "a" is voiced (for example, when voiced in loud and thin voices), different patterns must be produced even if they have the same spectrum structure. As a result, a large number of redundant patterns are stored in the code book, the capacity of the code book must be increased, and it takes a long time to search for patterns in the code book.