1. Field of the Invention
The present invention broadly relates to speech recognition and, more particularly, to a method and an apparatus for recognizing speech information based on a prediction concerning an object to be recognized. The invention also relates to a storage medium for storing a program implementing the above method.
2. Description of the Related Art
Speech recognition is primarily divided into two types of methods, i.e., a word speech-recognition method and a clause speech-recognition method. According to the word speech-recognition method, an input speech waveform is analyzed, and features are extracted from the waveform to produce a feature time series. Then, the similarity of the features in relation to the word dictionary represented by the feature time series which has been similarly obtained is calculated, and the calculated word is output as a recognition result. In the clause speech-recognition method, input speech is converted into phoneme strings, which are substituted by word strings. The word strings are then parsed, and are converted into character strings. Logic analyses and semantic analyses are then made on the character strings, so that a sentence is produced and output. Further research is being conducted on a method of providing word class information for homonyms, and a method of converting input speech into compound nouns or into a single clause. It is however very difficult to implement such methods.
In most cases, during conversation, humans recognize the speaker""s voice by understanding it as one meaning. While the speaker is speaking, the listener supports his/her understanding by predicting the content of the speech to some degree according to the previous topic and common sense. Consequently, even if the speaker wrongly selects or pronounces some words, the listener understand him/her without any problem. Even if there are many homonyms in a conversation, the listener can determine which word the speaker means.
In contrast, conventional speech recognition systems perform speech recognition according to pattern matching. More specifically, a dictionary provided for a system is searched for possible words which match a certain portion of an input speech waveform, and the searched words are output. Among the output words, the optimal word is selected. With this arrangement, if speech recognition fails while it is being conducted, the subsequent processing is spoilt.
Additionally, in most conventional speech recognition systems, it is assumed that input speech to be recognized satisfies the syntax of a certain language. Thus, various determinations are made in a speech recognition module, and the determination result is transferred to another process (another module). More specifically, in a speech recognition module, speech information is uniquely determined as a system command by being filtered (parsed). Not only processing for grammatically correct speech, but also processing for unnecessary words, such as exclamations and restated words, and for non-grammatical speech, such as anastrophy (inversion) and particle dropping is handled by language processing (verifying such words against a word database or a grammar database).
However, since parsing is performed in order to analyze the structure of syntax, elements other than syntax information are rejected. Even if a word is determined to be a significant word after parsing, general knowledge or knowledge of a specific field is not considered.
An example of conventional speech recognition systems is shown in FIG. 42. Since the flow of processing executed on input speech is unidirectional, the system processing continues to proceed in the same direction even if the processing result of a speech recognition module is incorrect. For example, an input that is determined to be syntactically correct but cannot be processed by the entire system upon performing speech recognition is disadvantageously received, and returns as an error. That is, a speech recognition unit and the whole system separately perform processing without operating together, thereby failing to implement complicated processing. As a consequence, the performance of the entire system is seriously influenced by the result of speech recognition.
Accordingly, it is an object of the present invention to provide an information processing apparatus and an information processing method for improving the speech recognition rate.
It is another object of the present invention to provide an information processing apparatus and an information processing method for performing speech recognition without being dependent upon a syntax structure.
In order to achieve the above objects, according to one aspect of the present invention, there is provided an information processing apparatus including a storage unit for storing prediction information concerning an object to be recognized. A recognition unit recognizes sound information based on the prediction information. A knowledge base stores knowledge concerning the type of data represented by the sound information. A prediction unit predicts sound information which is to be subsequently recognized by the recognition unit by referring to the knowledge stored in the knowledge base. An updating unit updates the prediction information stored in the storage unit based on a prediction result obtained by the prediction unit.
According to another aspect of the present invention, there is provided an information processing method including a recognition step of recognizing sound information based on prediction information, a prediction step of predicting sound information to be subsequently recognized in the recognition step by checking knowledge stored in a knowledge base for storing knowledge concerning the type of data represented by sound information, and an updating step of updating the prediction information based on a prediction result obtained in the prediction step.
According to still another aspect of the present invention, there is provided a computer-readable storage medium storing a response program for controlling a computer to perform speech recognition. The program includes codes for causing the computer to perform a recognition step of recognizing sound information based on prediction information, a prediction step of predicting sound information to be subsequently recognized in the recognition step by checking knowledge stored in a knowledge base for storing knowledge concerning the type of data represented by sound information, and an updating step of updating the prediction information based on a prediction result obtained in the prediction step.
Other objects and advantages besides those discussed above shall be apparent to those skilled in the art from the description of a preferred embodiment of the invention which follows. In the description, reference is made to accompanying drawings, which form a part thereof, and which illustrate an example of the invention. Such example, however, is not exhaustive of the various embodiments of the invention, and therefore reference is made to the claims which follow the description for determining the scope of the invention.