1. Field of the Invention
The present invention relates generally to a voice synthesizing apparatus and, more particularly, to a voice synthesizing apparatus for generating voice waveforms which simulate the tone colors of musical instruments.
2. Description of the Related Art
The basic construction of a typical voice synthesizing apparatus is explained below with reference to FIG. 3 . Text data, which is received by a text data input section 1, is supplied to a text analyzing section 2. The text analyzing section 2 analyzes the input text data to extract information on various factors such as words, blocks, breaks and the beginning and end of each sentence contained in the text data. A phonetic-symbol generating section 3 converts a series of characters, which are organized into words and blocks, into a series of phonetic symbols, while a rhythmic-symbol generating section 4 generates the required rhythmic symbols by utilizing, e.g., an accent dictionary and accent rules about the words and the blocks. A synthesis-parameter generating section 5 generates a time series of synthesis parameters by interpolating individual parameters corresponding to the above series of phonetic symbols.
A sound-source parameter generating section 6 generates a time series of sound-source parameters concerning rhythmic information on pitch, accent, sound volume and the like and supplies it to a sound-source section 7. If the supplied parameters represent a voiced sound, the sound-source section 7 generates pulses and supplies them to a voice synthesizing section 8. In the case of an unvoiced sound, the sound-source section 7 generates white noise or the like and supplies it to the voice synthesizing section 8. Upon receiving the synthesis-parameter output from the synthesis-parameter generating section 5, the voice synthesizing section 8 generates a voice by utilizing the output from the sound-source section 7 as a drive sound source. Since the sound-source section 7 and the voice synthesizing section 8 receive the sound-source parameters and the synthesis parameters, respectively, to generate a voice, they are hereinafter collectively referred to as a synthesizing section 9.
The synthesizing section 9 of the conventional voice synthesizer described above will be explained below in greater detail. FIG. 4 is a detailed block diagram showing the synthesizing section 9. For the sake of simplicity of explanation, it is assumed that a phonetic-parameter storing memory 14 stores the synthesis and sound-source parameters in the form of one block (frame) and the series of phonetic symbols in the form of one block (frame). The conventional voice synthesizer is provided with a pulse generator 10 as a voiced-sound source and a white-noise generator 11 as an unvoiced-sound source. In particular, since the pulse generator 10 as the voiced-sound source utilizes impulses, triangular waves or the like, the voice synthesized by the pulse generator 10 tends to sound mechanical. If a driver circuit of the type which utilizes residual waveforms (or output waveforms obtained from an input accoustic sound through the inverse filter of a synthesizing filter) is substituted for the pulse generator 10, various voices can be synthesized with improved quality.
A V/U switching section 12 is provided for effecting switching between the synthesization of a voiced sound and the synthesization of an unvoiced sound. If a fricative sound needs to be synthesized, the V/U switching section 12 provides a mixed output of the output from the pulse generator 10 and the output from the white noise generator 11 with an appropriately varied mixing ratio. An amplitude control section 13 controls sound volume which is one of sound-source patterns. A voice synthesizing filter 17 receives the synthesis parameters (representing phonetic features) and operates in response to the signal output from the amplitude control section 13 by utilizing such parameters as filter factors, thereby generating voice waveforms. Normally, voice synthesization is performed by a digital filter and the voice synthesizing filter 17 is therefore followed by a D/A converter. A low-pass filter 18 cuts a foldover frequency component, and a voice, amplified by an amplifier 19, is output from a loudspeaker 20. A parameter transfer control section 15 transfers the required data to each of the modules described above. A clock generator 16 serves to determine the timing of parameter transfer and a sampling interval for the system.
As described above, the conventional arrangement utilizes impulses, triangular waves, residual waveforms and the like as the source of a voiced sound. Accordingly, such conventional arrangements cannot be used to synthesize voices which simulate the tone colors of musical instruments. With such a conventional arrangement, it has therefore been difficult to vary the quality of the reproduced voice while maintaining phonetic the features thereof. However, an apparatus capable of outputting an instrumental sound or the like in the form of clear voice information has not yet been proposed.