A speech synthesizing technique for synthesizing speech by using a computer uses a speech segment dictionary. This speech segment dictionary stores speech segments in units (synthetic units) of speech segments, CV/VC, or VCV. To synthesize speech, appropriate speech segments are selected from this speech segment dictionary and modified and connected to generate desired synthetic speech. A flow chart in FIG. 15 explains this process.
In step S131, speech contents expressed by kana-kanji mixed text and the like are input. In step S132, the input speech contents are analyzed to obtain a speech segment symbol string {p0, p1, . . . } and parameters for determining prosody. The flow then advances to step S133 to determine the prosody such as the speech segment time length, fundamental frequency, and power. In speech segment dictionary look-up step S134, speech segments {w0, w1, . . . } appropriate for the speech segment symbol string {p0, p1, . . . } obtained by the input analysis in step S132 and the prosody obtained by the prosody determination in step S133 are retrieved from the speech segment dictionary. The flow advances to step S135, and the speech segments {w0, w1, . . . } obtained by the speech segment dictionary retrieval in step S134 are modified and concatenated to match the prosody determined in step S133. In step S136, the result of the speech segment modification and concatenation in step S135 is output as a synthetic speech.
Waveform editing is one effective method of speech synthesis. This method, e.g., superposes waveforms and changes pitches in synchronism with vocal cord vibrations. The method is advantageous in that synthetic speech close to a natural utterance can be generated with a small amount of arithmetic operations. When a method like this is used, a speech segment dictionary is composed of indexes for retrieval, waveform data (also called speech segment data) corresponding to individual speech segments, and auxiliary information of the data. In this case, all speech segment data registered in the speech segment dictionary are often encoded using the μ-law or ADPCM (Adaptive Differential Pulse Code Modulation).
The above prior art has the following problems.
First, when all speech segment data registered in the speech segment dictionary are encoded by using an encoding scheme such as the μ-law or A-law, no sufficient compression efficiency can be obtained since each speech segment data is nonuniformly quantized using a fixed quantization table. This is so because a quantization table must be so designed that a minimum quality can be maintained for all types of speech segments.
Second, when all speech segment data registered in the speech segment dictionary are encoded using an encoding scheme such as ADPCM, the operation amount in decoding increases by the operation amount of an adaptive algorithm. This is so because the advantage (small processing amount) of the waveform editing method is impaired if a large operation amount is required for decoding.