1. Field of the Invention
The present invention relates to a speech recognizer and a data processor using the same.
2. Description of the Related Art
Speech recognition refers to the recognition of the meaning of speech by automatically extracting and determining the most basic meaning in the information contained in the phonetic waves with an electronic data processor, an electronic circuit or the like, and speaker categorization refers to the identification of the speaker by extracting the identifying information contained in the phonetic waves.
Equipment for automatically recognizing speech has been researched for long time. Recently, a sound data input apparatus for dialog with a machine through a speech has been realized, and further development is expected.
FIG. 22 shows the structure of a speech recognizer in the related art. This conventional speech recognizer includes a sound data input part 2201, a speech segment data generating part 2202, a speech segment data processing part 2203, a speech recognition dictionary storing part 2204.
An input device such as a microphone is used for the sound data input part 2201.
The speech segment data generating part 2202 detects a speech segment from the speech data input from the sound data input part 2201 so as to generate speech segment data. The detection of the speech segment will be described in detail later.
The speech segment data processing part 2203 analyzes the speech segment data generated by the speech segment data generating part 2202 and recognizes the meaning. Conventional methods for recognizing the meaning are specifically described in "Digital Speech Process" (by S. Furui, Tokai University Publishing Association)(published in English translation as "Digital Speech Processing Synthesis and Recognition" (Marcel Dekker 1989)). Generally, a speech recognition dictionary storing part 2204 includes a phoneme dictionary and a word dictionary as dictionaries for speech recognition. In the speech recognition process, a phoneme is recognized based on the distance or the similarity between the short time spectrum of an input speech and that of the reference pattern, and the meaning of the speech is identified by a word matching the recognized phoneme sequence in the word dictionary.
However, conventional speech recognizers posed the problem that it is not easy to correct an error in the recognition of speech data.
More specifically, in the actual speech recognition made by humans, the speech data that was not recognized correctly at first can be corrected later in the context of the conversation for understanding, and the action of the people who are talking is also corrected accordingly. However, in conventional speech recognizers, it is not easy to correct the meaning of the speech once it has been recognized wrongly. Therefore, for example, in an apparatus in which a command is input by speech, it is difficult to correct an operation in the case that an erroneous command was input due to an error in the recognition of the speech data. Thus, the range for the application of speech recognizers is limited.