Generally, conventional typical speech recognition systems regard the speech recognition result of a certain speech and the speech recognition result of other speech as being completely unrelated ones. It seldom happens that some dependence is introduced into displays of recognition results of a plurality of speeches. As a result, the individual recognition results are just arranged. A system described in each of Patent Documents 1 and 2, for example, has a function of displaying a list of recognition results of a plurality of speeches. However, a recognition result of each speech is self contained by itself.
On the other hand, speech recognition is not always performed perfectly. An approach to displaying a more reliable recognition result by priority, based on a certain evaluation measure has been hitherto employed.
As the evaluation measure,                linguistic likelihood        acoustic likelihood, or        confidence (refer to Non-patent Document 1) of a recognition result        
is employed. In Patent Document 3, for example, when a normalized likelihood (referred to as the “confidence”) of a recognition result as the evaluation measure exceeds a certain threshold value given in advance, the recognition result is output. Otherwise, the recognition result is discarded.
Patent Document 4, as a publication about display of a recognition result, for example, discloses a system that displays a speech waveform and displays a character string (with a part of the character string of a recognition result being sometimes omitted, for display) representing at least a part of content of a speech portion, thereby allowing the content of the speech portion included in the waveform to be visually checked. In many cases, a tendency in the content of a speech or the like cannot be complemented by display of the head and the end of a speech (because an important matter (identified by a nominal) tends to be spoken between the head and the end of the speech, rather than at the head or end of the speech, though depending on the field of the speech). At the head of a spoken language, a nonsense word (such as “well”, “Ah”, “Yes”, or the like) often appears. At the end of the spoken language in particular, utterance that is not precise (accurate) but vague is often made. As such utterance, rapid pronunciation (such as “It's so, isn't it?” instead of “It is so, isn't it?”), and word-final vowel-lengthening (such as “it iis” instead of “it is”) may be pointed out. In such a case, the approach disclosed in Patent Document 4 may not display a significant character string.
Meanwhile, as processing for screen display, in Patent Document 5, there is disclosed an image/video processing and displaying method, in which, when an important portion of an image is known, the size of the image as a whole is simply reduced and then, superimposed display of the important portion of the image is performed with a larger display magnification rate than a display magnification rate of the image.
Patent Document 1:
JP Patent Kokai Publication No. JP-A-11-338494
Patent Document 2:
JP Patent Kokai Publication No. JP-A-11-150603
Patent Document 3:
JP Patent Kokai Publication No. JP-P2003-50595A
Patent Document 4:
JP Patent Kokai Publication No. JP-P2002-297188A
Patent Document 5:
JP Patent Kokai Publication No. JP-P2004-326555A
Non-patent Document 1:
T. Schaaf, T. Kemp: Confidence measures for spontaneous speech recognition, in Proc. ICASSP 1997, Vol. 2, pp. 875 ff, Munich, April 1997
Non-patent Document 2:
Frank Wessel, Ralf Schluter, Kalus Macherey, ans Herman Ney, “Confidence Measures for Large Vocabulary Continuous Speech Recognition,” IEEE Trans. on Speech and Audio Processing. Vol 9, No. 3, March 2001
Non-patent Document 3:
B. Rueber, “Obtaining confidence measures from sentence probabilities,” in Proc. 5th Eur. Conf. Speech Communication Technology 1997, Rhodes, Greece, September 1997, pp. 739-742.
Non-patent Document 4:
Tech. Rep. Interactive Systems Labs.,” ILKD, April. 1996.
Non-patent Document 5:
T. Kemp and T. Schaaf, “Estimating confidence using word lattices,” in Proc. 5th Eur. Conf. Speech, Communication, Technology 1997, Rhodes, Greece, September 1997, pp. 827-830.