1. Field of the Invention
The present invention relates to a speech dialogue system for interfacing the man and the computer in a form of a dialogue using speech data.
2. Description of the Background Art
In recent years, the development of the speech dialogue system using the speech data as an interface between the man and the computer has been advanced considerably.
In a speech dialogue system, which is useful in a multi-media dialogue system for displaying the visual data such as a graphic data and image data along with the speech data output, when the human speaker utters speech messages toward the microphone, the system recognizes these speech messages, and outputs the appropriate response in speech data from a loudspeaker, so as to carry out the dialogue with the human speaker.
For example, such a speech dialogue system may be employed in a hamburger shop for taking the order from the customer. In this case, when the customer utters the order such as "Two hamburgers and three orange juices" toward the microphone, the system recognizes this speech input, and outputs the synthetic speech response for making a confirmation such as "Is it two hamburgers and three orange juices that you have just ordered?". In response to this synthetic speech response, when the customer utters "Yes", the recognized speech content is confirmed, and subsequently notified to the shop worker.
In such a conventional speech dialogue system, however, in a case the customer uttered "Three hamburgers . . ." by mistake, it is not possible for the customer to make a correction immediately, and the customer must deny the synthetic speech response such as "Is it three hamburgers . . .?" for making a confirmation from the system first, and then make the correct speech input such as "Two hamburgers . . ." again.
Moreover, in a case the customer uttered "Two hamburgers, one coke, and one ice cream, please", and the system erroneously recognized this speech input and outputs the synthetic speech response "Is it four potatoes, one coke, and one ice cream you have just ordered?", the customer may very well be tempted to make a correction by interrupting the synthetic speech response as soon as the synthetic speech response reaches to a portion ". . . four potatoes . . .", but even in such a case, in a conventional speech dialogue system, the customer cannot make the correction until the output of the entire synthetic speech response is completed.
For these reasons, in a conventional speech dialogue system, the dialogue often requires a considerable amount of time, and it can be quite cumbersome.
In other words, in a conventional speech dialogue system, it has not been possible to carry out the reception of the speech input from the human speaker and the output of the synthetic speech response simultaneously, such that the input of the speech to be made by the human speaker can be made only after the output of the entire synthetic speech response from the system has been completed, and so consequently the dialogue can be quite time consuming and inefficient especially when the system makes the erroneous recognition of the speech input.