1. Field of the Invention
This invention includes speech synthesizers, and more particularly, to an architecture for speech synthesizer and a method to synthesize speech, which allows the speech synthesizer to be capable of driving external devices in a multi-tasking manner while nonetheless allowing the software complexity and voice concatenation to be simple to implement.
2. Description of Related Art
A synthesizer may be a device that combines a variety of items so as to form a new, complex product. Speech synthesizers are widely utilized in various systems where voice is used to output certain messages or data to the user, such as personal computers, mobile phones, toys, and warning systems, to name a few. A speech synthesizer is typically provided with a ROM (read-only memory) unit which stores a database of various sounds or words that can be retrieved and combined to form a stream of voices of specific meanings. This ROM unit is typically partitioned into a number of sections, called speech sections. In one standard for voice synthesizing, such speech sections are designated by H.sub.4, S.sub.1, S.sub.2, . . . , S.sub.n. and T.sub.4. Each speech section represents one of 250 basic phonic elements that can be selected and combined into the sound data of various words or phrases. Alternatively, each speech section can store the sound data of complete words. However, this is merely a design choice by the speech synthesizer designer.
The data in each speech section can be selected for synthesizing into words or phrases through various speech equations (EQ), each EQ representing the combination of a number of selected phonic elements that are combined in accordance with the EQ to form a particular word or phrase of a specified meaning. For example, EQ=H.sub.4 +S.sub.1 +S.sub.2 +S.sub.3 +T.sub.4 may represent either a five-sound word or a five-word phrase.
The foregoing scheme of using phonic elements for the synthesizing of words allows the required memory space for the speech database to be significantly reduced as compared to the scheme of storing the sound of each word in the ROM unit. Moreover, it allows the designer to be more flexible and versatile in designing the speech synthesizer for the purpose of providing the sound data of more complex words or phrases.
One standard for speech synthesis defines one section of speech data as the combination of a number of bytes, respectively designated by H.sub.4, S.sub.1, S.sub.2, S.sub.3, and T.sub.4. This scheme is illustratively depicted in FIG. 1. Each of the bytes (H.sub.4, S.sub.1, S.sub.2, S.sub.3, T.sub.4) represents one basic constituent element of sound data and can be either a single sound, a series of sounds, a piece of music, or the combination of several pieces of music.
FIG. 2 is a schematic block diagram showing a conventional speech synthesizer, as designated by the reference numeral 10, that can be used for the synthesizing of the speech data shown in FIG. 1 into digital sound data. As shown, this speech synthesizer 10 includes a memory unit 11, such as a ROM unit, and a synthesizer 12. The ROM unit 11 is used to store a database of phonic elements and various other kinds of speech data that can be selectively retrieved for synthesizing into sound data of specific meanings. When the speech synthesizer 10 receives a trigger signal 14, the corresponding phonic elements in the ROM unit 11 are retrieved and then transferred to the synthesizer 12 for synthesizing into sound data. The synthesized sound data are then converted into audible sounds by a loudspeaker 13. One benefit of this speech synthesizer is that its system architecture is quite simple to implement.
One drawback to the foregoing speech synthesizer 10, however, is that it is only capable of outputting the synthesized speech data as audible sounds through the loudspeaker 13, but incapable of driving external devices such as motors or light-emitting diodes (LED) in a multi-tasking manner at the same time.
The synthesizer 12 utilized in the speech synthesizer 10 is typically included in a state machine that can perform some I/O controls. One drawback to the utilization of the speech synthesizer in state machine, however, is that the I/O ports thereof can be switched for other I/O functions only when at the break between two consecutive speech sections. Therefore, the architecture of FIG. 2 would not meet high quality requirements for speech synthesizers.
FIG. 3A is a schematic block diagram of a conventional speech synthesizer 20 with multi-tasking capability. As shown, this speech synthesizer 20 includes a memory unit 21 such as a ROM unit, a micro-controller 22, a synthesizer 23, and a digital-to-analog converter (DAC) 24. Moreover, the speech synthesizer 20 is coupled to a loudspeaker 25. The memory unit 21 is used to store a database of phonic elements and various other kinds of speech data that can be selectively retrieved for synthesizing into sound data of specific meanings. When the speech synthesizer 20 receives a trigger signal 27, the corresponding data are retrieved under control of the micro-controller 22 from the memory unit 21 and subsequently transferred to the synthesizer 23 for synthesizing into sound data of specific meanings. The digital output from the synthesizer 23 is then converted by the DAC 24 into analog form which is then converted by the loudspeaker 25 into audible form. The micro-controller 22 allows the speech synthesizer 20 to perform I/O functions with external devices such as motors or LEDs.
Alternatively, as shown in FIG. 3B, the micro-controller 22 and the synthesizer 23 in the speech synthesizer 20 of FIG. 3A can be replaced by a single microprocessor 26. With this architecture, both the I/O controls and the synthesizing of speech data are performed by the microprocessor 26.
The foregoing speech synthesizer with multi-tasking capability, however, still has a drawback in encoding. For example, the voice concatenation, which is a technique to combine a number of separate phonic elements into a continuous stream of meaningful sounds, would be very complex in algorithm that can be very difficult to code into software program. Therefore, the design of the speech synthesizer would be a very laborious and time-consuming job to carry out. The development period typically requires at least one month.
In conclusion, the prior art has the following drawbacks.
(1) First, in respect to the prior art of FIG. 2, although it is simple in system architecture that allows it easy to design, it is incapable of driving external devices such as motors and LEDs in a multi-tasking manner at the same time when performing the speech synthesis. Moreover, it cannot switch the output state of the I/O ports except at the break between two consecutive speech sections.
(2) Second, in respect to the prior art of FIGS. 3A-3B, its multi-tasking capability is complex in algorithm that would cause the programming to be very complex to implement. The development period is therefore quite long.