1. Field of Invention
This invention relates to a continuous word recognition method used in a speech recognition device, and also to a recording medium on which is recorded a continuous word recognition processing program used in a speech recognition device, in which continuous words which are structured by a plurality of words and which are continuously spoken with a little interval between each words are input, these continuous words are recognition processed and the recognition result is output.
2. Description of Related Art
Recently, electronic devices which use speech recognition technology are used in various fields. As one example, a clock which is called a sound clock can be listed. In this sound clock, a current time and an alarm time can be set by sound, and the sound clock can inform a user of a current time by sound.
This type of sound clock can be used as a toy for children in addition to being used as a daily necessity. It is desired that the cost of the device itself be as low as possible. Because of this, there is a large limitation on the CPU processing capability and memory capacity which are used. One of the problems to be solved is to have functions with high capability under these limitations.
In this type of sound clock, when current time or alarm time setting is performed, generally, for example, when xe2x80x9ca.m.xe2x80x9d, xe2x80x9c1 o""clockxe2x80x9d, and xe2x80x9c20 minutesxe2x80x9d are set, first, xe2x80x9ca.m.xe2x80x9d, is spoken and recognized. Subsequently, xe2x80x9c1 o""clockxe2x80x9d is spoken and recognized. Then, xe2x80x9c20 minutesxe2x80x9d is spoken and recognized. Thus, an operation is performed such that each word is spoken and recognized.
However, in order to recognize a content which forms a group which is thus structured by a plurality of words, the operation where each word is spoken and recognized is troublesome, and there are many problems in terms of using the device.
In order to solve this problem, it is effective to continuously speak the content which forms the group which is structured by a plurality of words and recognize the continuously spoken words as-is. However, among the words which forms the group, there are words which are easily recognized and words which are not easily recognized. Therefore, it is difficult to recognize both types of words.
For example, in the example described earlier, when xe2x80x9ca.m.xe2x80x9d, xe2x80x9c5 o""clockxe2x80x9d, and xe2x80x9c20 minutesxe2x80x9d are continuously spoken and recognition processed, if xe2x80x9ca.m., 9 o""clock, 20 minutesxe2x80x9d is output as a recognition result of the device, the speaker realizes that a misrecognition has occurred. Therefore, the speaker again speaks xe2x80x9ca.m.xe2x80x9d, xe2x80x9c5 o""clockxe2x80x9d, and xe2x80x9c20 minutesxe2x80x9d, the recognition processing needs to be performed again, and there is a problem of spending too much time until all the words are correctly recognized.
Therefore, an object of this invention is to provide a continuous word recognition method used in the speech recognition device, and also a recording medium on which is recorded a continuous word recognition processing program used in a speech recognition device, which can effectively and reliably recognize continuous words which form one grouped content which is structured by a plurality of words, and which, particularly, is extremely effective when time setting is performed.
In order to solve the objections described above, this invention provides a continuous word recognition method in a speech recognition device which has one group of contents formed by a plurality of words, inputs continuous word sounds which are continuously spoken with a small interval between words and recognition processes the continuous word sounds, and outputs the recognition result.
The method may include recognition processing all of the input continuous words, outputting the recognition result of all of the continuous words, inputting a response from a speaker showing affirmative/negative with respect to the recognition result and recognition processing the response, determining whether the response from the speaker is affirmative, confirming the recognition result as all of the continuous words when it is determined that the response is affirmative and, when it is determined that the response is negative, outputting the recognition result word by word from a first to an nth (n is a positive integer) of the words that form the continuous words, confirming the recognition result for each word by determining an affirmative or negative from the speaker with respect to the recognition result for each word, and obtaining a correct recognition result for each word.
Furthermore, a process of outputting the recognition result word by word from a first to an nth words that form the continuous words, confirming the recognition result for each word by determining an affirmative or negative from the speaker for the recognition result for each word, and obtaining a correct recognition result for each word, may include outputting a predetermined m (m is a positive integer) candidates in order, starting with a first candidate, with respect to a word which is a current processing target (defined as a recognition target word) among the first to the nth of the words that form the continuous words, inputting a response from the speaker showing affirmative/negative per output candidate and recognition processing the response, confirming the candidate as the recognition target word when the response of the speaker is determined to be affirmative, outputting a following candidate when the response of the speaker is determined to be negative, inputting the response from the speaker showing affirmative/negative with respect to the newly output candidate and recognition processing the candidate, confirming the candidate as the recognition target word when the response of the speaker is determined to be affirmative, outputting a following candidate if negative is determined, and performing this processing up to the mth candidate.
Furthermore, a request to speak the recognition target word again is output to the speaker when the response with respect to the mth candidate is negative.
Additionally, when a word among the first to the nth (n being a positive integer) words that form the continuous words is a word which is mutually exclusive in terms of a meaning, one of two words is output as a recognition result, and when the response from the speaker showing affirmative/negative with respect to the output is negative, the other word of the two words is confirmed as a recognition result at that point.
A recording medium on which is recorded a continuous word recognition processing program of this invention used in in a speech recognition device that has a group of contents formed by a plurality of words, inputs continuous word sounds which are continuously spoken with a short interval between words and recognition processes the continuous word sounds, and outputs the recognition result. The processing program may include a first step of recognition processing all of the input continuous words, a second step of outputting a recognition result of all of the continuous words through this recognition processing, inputting a response from a speaker showing affirmative/negative of the recognition result with respect to the output and recognition processing the response, and determining whether the response from the speaker is affirmative, and a third step of confirming the recognition result as all of the continuous words when the response of the speaker is determined to be affirmative by the determination result, and, when the response of the speaker is determined to be negative, outputting the recognition result word by word from a first to an nth (n is a positive integer) words that form the continuous words, and obtaining a correct recognition result per word by determining the affirmative/negative of the speaker for the recognition result for each word.
Additionally, the process of outputting the recognition result word by word from a first to an nth words that form the continuous words, confirming the recognition result for each word by determining an affirmative or negative from the speaker for the recognition result for each word, and obtaining a correct recognition result for each word in the third step, may include outputting a predetermined m (m is a positive integer) candidates in order, starting with a first candidate, with respect to a word which is a current processing target (defined as a recognition target word) among the first to the nth words that form the continuous words, inputting a response from the speaker showing affirmative/negative per output candidate and recognition processing the response, confirming the candidate as the recognition target word when the response of the speaker is determined to be affirmative, outputting a following candidate when the response of the speaker is determined to be negative, inputting the response from the speaker showing affirmative/negative with respect to the newly output candidate and recognition processing the candidate, confirming the candidate as the recognition target word when the response of the speaker is determined to be affirmative, outputting a following candidate if negative is determined, and performing this processing up to the mth candidate.
Furthermore, a request to speak the recognition target word again is output to the speaker when the response with respect to the mth candidate is negative.
Additionally, when a word among the first to the nth (n is a positive integer) words that form the continuous words is a word which is mutually exclusive in terms of a meaning, one of two words is output as a recognition result, and when the response from the speaker showing affirmative/negative with respect to the output is negative, the other word of the two words is confirmed as a recognition result at that point.
This invention is effective when it is applied to an interactive type speech recognition device which outputs the recognition result as a speaker inputs continuous word sounds which form a group which is structured by a plurality of words and which are continuously spoken with a small interval between words, and this continuous word sound is recognition processed.
First, all of the continuous words which have been input are recognition processed, the recognition result of all of the continuous words are output, a response from the speaker showing affirmative/negative of the recognition result (for example, xe2x80x9cyesxe2x80x9d or xe2x80x9cnoxe2x80x9d) is input, and this xe2x80x9cyesxe2x80x9d or xe2x80x9cnoxe2x80x9d is recognition processed. If the response from the speaker is determined to be affirmative (xe2x80x9cyesxe2x80x9d), the continuous words which have been input are confirmed by the recognition result. If the response is negative, for each word from a first to an nth (n is a positive integer) which structures the continuous words, the content from the speaker showing affirmative or negative is recognized in order, affirmative or negative is determined, and the recognition processing target word at that point is confirmed by the recognition result.
By so doing, the speaker can speak the continuous words all together and can easily perform a sound inputting operation. Furthermore, because the device performs speech recognition processing for all of the continuous words and the result is output, if it is correct, the recognition result of the continuous words at that point can be confirmed, and the corresponding following processing can begin. Therefore, effective processing is possible. Additionally, if the recognition result of any word among the continuous words is not correct, the recognition result of the respective words is output, affirmative/negative is determined in order for each of the words which structure the continuous words, and the recognition result for each word is confirmed so that confirmation processing of an accurate recognition result can be performed.
Additionally, with respect to one word, if a candidate which has been preset (mth candidate) is negative, a speaking request is again output for the recognition processing target word and recognition processing is performed for the speaker so that confirmation of an accurate recognition result can be performed.
Thus, this invention is a continuous word recognition method with both convenience and accuracy.
Furthermore, if a recognition processing target word is a mutually exclusive word, when one of two words is output as a recognition result, if the response from the speaker showing affirmative/negative is negative, the other word is confirmed as the recognition processing target word at that point.
For example, if the recognition processing target word is xe2x80x9ca.m.xe2x80x9d, it will be selected among xe2x80x9ca.m.xe2x80x9d and xe2x80x9cp.m.xe2x80x9d. Therefore, if the recognition result of xe2x80x9ca.m.xe2x80x9d is negative, xe2x80x9cp.m.xe2x80x9d is confirmed as the recognition result at the point. By also performing this type of processing, it is possible to effectively confirm a recognition result.