1. Field of the Invention
This invention relates to a speech recognition system, speech synthesis system, and a speech recognition and synthesis system, and more particularly relates to speech recognition system, speech synthesis system, and speech recognition synthesis system suitable for recognition and synthesis of acoustic signal in the audible band range.
2. Description of the Related Art
For conversation between one or more of apparatus and one or more of persons, conventionally a system is structured so as the apparatus recognizes a natural language (such as Japanese and English) by combining speech recognition and speech synthesis, convert words or sentences to speech signal, and answer in a natural language. However, conventional speech recognition systems are affected by noise and disturbing sound, and the correct recognition rate is low. To solve this problem, some techniques have been proposed as described herein under.
One is a technique to avoid noise and disturbing sound by modifying a microphone. For example, in a technique a directional microphone is directed to a sound source to reduce noise. The technique disclosed in Japanese Patent Laid-Open No. Sho 59-012500 (1984) may be referred as the example.
Otherwise, in another technique, noise and disturbing sound are estimated by signal processing to subtract the estimated noise and disturbing sound from the signal. For example, in the pre-processing such as frequency conversion, spectrum component of background noise is subtracted to mitigate the affection of the background noise, this technique is referred as spectrum subtraction. The technique disclosed in Japanese Patent Laid-Open No. Sho 57-212496 (1982) may be referred as the example. Further otherwise, in another technique, for the case in which noise is collected using adaptive signal processing, the noise is filtered to remove the noise from the mixed signal of speech and noise.
In still another technique, disturbing sound and speech to be recognized are mapped using a certain map in a space where the position of the disturbing sound is far from that of the speech, and the speech is recognized. For example, "A noise removing device using a neuro-network model" disclosed in Japanese Patent Laid-Open No. Hei 2-15718 (1990) describes a device in which a neuro-network having a multilayered structure learns the map with input of mixed speech and noise and output of speech without noise to remove the noise. Japanese Patent Laid-Open No. Sho 60-75898 (1985) discloses a technique that solely harmonic component of speech pitch is detected to prevent the degradation of word recognition performance.
From the view point of the freedom in designing a speech recognizer, these techniques are categorized to a technique for removing disturbing sound by providing a suitable method for sound collection in an service environment, a technique for removing disturbing sound by suitable signal processing, and a technique for separating disturbing sound from speech by selecting suitable map.
On the other hand, as method to transmit a message from an apparatus to a person, a method which involves a natural language has been disclosed conventionally. As a method which does not involves a natural language, Japanese Patent Laid-Open No. Sho 54-153070 (1979) discloses an apparatus in which the time of a clock is informed by one-to-one corresponding tones such as "do, re, mi, fa, . . . " instead of a natural language to numerals. This apparatus involves transmission of information from an apparatus which does not speak a natural language to a person, the transmission from the apparatus to a person does not involve a natural language, therefore, this is not a conversational system.
Japanese Patent Laid-Open No. Sho 61-110837 (1986) discloses an apparatus remote controllable by clapping hand or whistling, this apparatus is an example of information transmission means from a person to an apparatus. The claim 1 of this invention describes "A method for controlling an air conditioner which is remote controllable to control ON-OFF switching of power supply and other setting remotely, wherein an air conditioner is switched to start-up or to shut-down by inputting arbitrary numbers of acoustic wave in a certain time period intermittently". The description of background in the invention points out the disadvantage of a wireless remote controller that a controller can be dropped or lost, however, the invention does not involves the phoneme design associated with measures against noise and disturbing sound. Also this technique involves only one-way information transmission from a person to an apparatus, therefore, this is not a conversational system.
As described herein above, from the view point of the freedom in designing, these conventional speech recognition apparatus are optimized in designing based on the view point of;
(1) removal of disturbing sound by providing a suitable sound collection, PA1 (2) removal of disturbing sound by suitable signal processing, and PA1 (3) separation of speech and disturbing sound by selecting a suitable map.
As described herein above, in the field of speech recognition and speech synthesis using a natural language, conversational systems are disclosed, however, natural conversation using a natural language is still technically difficult now. Particularly, the recognition is very difficult in usual institutional and home environments because of disturbing sound and noise.
Synthesizers and recognizers not using a natural language have been disclosed independently, these systems are not introduced to improve disturbing sound and noise reduction performance, therefore, a conversational system can not be suggested from these systems.
In view of the above mentioned situation, the present invention has been accomplished to realize a conversational system which is capable of recognizing the information transmission between one or more of apparatus and one or more of persons provided with measures against disturbing sound and noise.