1. Field of the Invention
The present invention relates to a dialogue generation apparatus and a dialogue generation method, each utilizing a speech recognition process.
2. Description of the Related Art
In recent years, many users have come to use various types of dialogue means, such as electronic mail, electronic chat and bulletin board system (BBS). The electronic mail, electronic chat and bulletin board system are text-based dialogue means that achieve an exchange of comparatively short text between the users, unlike telephone and voice chat that are voice-based dialogue means. To use the text-based dialogue means, any user operates a text input interface used as input means, such as a keyboard or the numeric keypad or touch panel provided on a cell phone. In order to enhance the usability of text input, thereby to enable the users to enjoy rhythmical dialogues, text input interfaces based on speech recognition are used in some cases.
In the speech recognition process, the user's speech is converted sequentially into specific standby words on the basis of an acoustic viewpoint and a linguistic viewpoint, thereby generating language text composed of a string of standby words representing the contents of the speech. If the standby words are decreased, the recognition accuracy of individual words increases, but the number of recognizable words decreases. If the standby words are increased, the number of recognizable words increases, but the chances are greater that individual words will be recognized erroneously. Accordingly, to increase the recognition accuracy of the speech recognition process, a method of causing specific words expected to be included in the user's speech to be recognized preferentially or only the specific words to be recognized has been proposed. Known in the art is not only the continuous speech recognition for recognizing word strings such as so-called “continuous speech,” but also the isolated word recognition for recognizing short words such as operating instructions or keywords input to apparatuses. The isolated word recognition is superior to the continuous speech recognition in terms of recognition accuracy of specific words.
With the electronic mail communication apparatus disclosed in JP-A 2002-351791, since a format for writing standby words in an electronic mail text has been determined previously, standby words can be extracted from the received mail according to the format. Therefore, with the electronic mail communication apparatus disclosed in JP-A 2002-351791, high recognition accuracy can be expected by preferentially recognizing the standby words extracted on the basis of the format. In the electronic mail communication apparatus disclosed in JP-A 2002-351791, however, if the specific format is not followed, standby words cannot be written in the electronic mail text. That is, in the electronic mail communication apparatus disclosed in JP-A 2002-351791, since the format of dialogue is limited, the flexibility of dialogue is impaired.
With the response data output apparatus disclosed in JP-A 2006-172110, an interrogative sentence is estimated from text data on the basis of a sentence end used at the end of an interrogative sentence. If there are specific paragraphs, including “what time” and “where,” in the estimated interrogative sentence, words representing time and place are recognized preferentially according to the respective paragraphs. If none of specific paragraphs, including “what time” and “where,” are present in the interrogative sentence, words, including “yes” and “no,” are recognized preferentially. Accordingly, with the response data output apparatus disclosed in JP-A 2006-172110, high recognition accuracy can be expected in the user's speech response to an interrogative sentence. On the other hand, the response data output apparatus does not improve the recognition accuracy in a response to a declarative sentence, an exclamatory sentence, and an imperative sentence other than an interrogative sentence.
With the speech-recognition and speech-synthesis apparatus disclosed in JP-A 2003-99089, input text is subjected to morphological analysis and only the words constituting the input text are used as standby words, which enables high recognition accuracy to be expected for the standby words. However, the speech-recognition and speech-synthesis apparatus disclosed in JP-A 2003-99089 has been configured to achieve menu selection, the acquisition of link destination information, and the like, and recognize only the words constituting the input text. That is, a single word or a string of a relatively small number of words has been assumed to be the user's speech. However, when text (return text) is input, words not included in the input text (e.g., incoming mail) have to be recognized.
Note that the accuracy of speech recognition is influenced by environmental factors. If the input speech contains relatively large noise, the content of the input speech may not be fully reflected in the speech recognition result. Consequently, the user needs to input the speech repeatedly or give up inputting the speech.
The above-mentioned text-based dialogue means may be used to accomplish periodic dialogue with a family member living in a far-off location or a safety confirmation with an elderly person living alone. However, dialogues achieved by the text-based dialogue means may become flat and dull and hardly last long.