1. Field of the Invention
The present invention relates to a recognition apparatus, such as a voice recognition apparatus, a character recognition apparatus, etc., which displays a plurality of recognition candidates obtained by recognition processing and thereby permits a user to select a recognition result from among the recognition candidates.
2. Description of the Related Art
In a voice recognition apparatus, such as a large-vocabulary voice recognition system, etc., a recognition candidate which is best in recognition score is not necessarily right because of the performance limit of the voice recognition system used. For this reason, use is made of a man-machine system arranged such that high-ranking candidates which are high in recognition score are displayed on a display unit and the user selects the right candidate out of the displayed candidates.
FIG. 1 illustrates a conventional large-vocabulary voice recognition system.
A voice to be recognized is entered into a microphone 101 and then digitized by an A/D converter 102. The digitized voice is subjected to frequency analysis in a frequency analyzer 103 for conversion into a time-series pattern of parameters representing the frequency characteristic of a phoneme of the voice. The parameters include, for example, powers of frequency components, LPC coefficients resulting from linear prediction analysis and cepstrum coefficients resulting from cepstrum analysis.
In a DTW matching section 104, the time-series pattern is matched against the time-series pattern of each word stored in a template memory 105 by means of the dynamic time-warping method. The matching section carries out the comparison between the two time-series patterns while normalizing (expanding and contracting) their time bases on the basis of a dynamic programming technique. Thereby, a distance value ( for example, the Euclidean distance between the parameters) between the input voice and each word stored in the template memory 105 is calculated.
These distance values are sorted (rearranged) in the order of distance in a distance order sorter 106, so that distance values taking predetermined high ranks beginning with the smallest and information on character trains of the corresponding word candidates are stored in a high-ranking candidate memory 107.
If word candidates which take, for example, the first to thirtieth ranks are stored in the high-ranking candidate memory 107, the distance order sorter 106 will execute such operations as indicated in steps S2 to S5 of FIG. 2.
That is, first, in step S1, the contents of the high-ranking candidate memory 107 are initialized.
Next, in step S2, a character string of a word candidate and a distance value are read from the DTW matching section 104. When NO in step S3, the distance value is written into the 31st address area of the high-ranking candidate memory 107 in step S4.
In step S5, the distance value which has been written into the 31st address area is sequentially compared with distance values of word candidates which had been written into the 1st to 30th address areas of the high-ranking candidate memory 107 in ascending order of distance values when the 31th address area was written into, so that the distance values stored in the 1st to 31st address areas are sorted.
The above steps S2 to S5 are repeated until the distance values for all the word candidates have been read from the DTW matching section 104, namely, until the determination result "YES" is obtained in step S3.
A sort termination signal is sent from the distance order sorter 106 to a display controller 108 at the termination of sort processing in the order of distance. Upon receipt of the sort termination signal the display controller 108 displays the character strings of word candidates stored in the high-ranking candidate memory 107 in the distance order and their ranking numbers on the display unit 109. In the above case, for example, the display controller 108 displays a character string of each of the word candidates stored in the 1st to 30th address areas of the high-ranking candidate memory 107 as the process in step S2 in FIG. 2. On termination of the display, the display controller 108 sends a display termination signal to a word select controller 110.
Upon receipt of the display termination signal, the word select controller 110 accepts the ranking numbers of word candidates entered by the user from a keyboard 111 or a mouse 112, which is a pointing device, and then outputs character strings of the correct words corresponding to the entered ranking numbers to another device, an application program, etc., which are not shown in particular.
FIG. 3 illustrates a first display example of word candidates displayed on the display unit 109 (see FIG. 1 ) in the above-described conventional voice recognition system. As can be seen from the figure, even if the user wants to specify the word "OOSAKA" corresponding to the user's utterance from among the displayed word candidates, it will be very difficult for the user to search for the object word. As described above, the problem with the first prior art is that it takes long for the user to search for the correct answer. This will increase psychological burden and stress imposed on the user.
Next, FIG. 4 illustrates a second display example of word candidates displayed by the display unit 109 in the above-described conventional voice recognition system. When utterances corresponding to a Japanese sentence "ANATAHA HONO YONDEIMASUKA" are entered from the microphone 101, the system of FIG. 1 performs a recognition process continuously. In the high-ranking candidate memory 107 are stored character strings of word candidates of each of the words composing the input sentence and their ranking numbers in ascending order of distance. Upon receipt of a sort termination signal indicating the termination of sorting of the input words from the distance order sorter 106, the display controller 108 displays character strings of word candidates taking the first to eighth ranks, their ranking numbers and their distance values for each word as shown in FIG. 4. Hereinafter, such a combination of word candidates displayed in the form of a table for plural words is referred to as a word candidate lattice. Here, figures within parentheses for each word candidate represent its distance value (recognition similarity). The smaller the value, the higher is the probability of being correct.
However, the problem with the second display example of FIG. 4 is that it will take longer than in the first display example of FIG. 3 for the user to search the displayed word candidate lattice for the word string, namely, the right answer of the input sentence.
The above-described problems arise not only in a voice recognition system but also in a character recognition system. FIG. 5 illustrates a conventional character recognition system.
A string of characters written by the user on an input tablet 501 is digitized first and then sequentially entered into a feature amount extractor 502 where the feature amount of each character is extracted for conversion into a feature vector pattern for each word.
A matching section 503 makes a comparison between the feature vector pattern of each of the characters and the feature vector pattern of each of characters stored in a template memory 504 while normalizing the size of characters. Thereby, a value of distance between each character stored in the template memory 504 and each input character, so that a plurality of character candidates are obtained for each input character.
The subsequent operations are the same as in the voice recognition system of FIG. 1.
FIG. 6 illustrates a display example of character candidates displayed by the display unit 109 (FIG. 5) in the conventional character recognition system. In this display example, a character candidate lattice in which a plurality of character candidates for each input character are displayed in the form of a table as in FIG. 4.
As can be seen from the display example of FIG. 6, however, the problem with the conventional character recognition system is that it takes a long time for the user to search the displayed character candidate lattice for the right answer of the input character string.