1. Field of the Invention
The present invention relates to a voice interactive system and a voice interactive method for allowing a voice interactive system to smoothly function, in which a user realizes various functions by interaction with synthesized voice generated by the system.
2. Description of the Related Art
Recently, due to the enhancement of a computer processing speed, the enlargement of an information capacity that can be used, and the rapid advancement of a voice recognition technique, voice recognition at a word level can be put into practical use without any problem. Therefore, a voice recognition technique is also becoming one of important techniques for configuring a system. The use of such a voice recognition technique enables various problems to be solved by using a voice interactive interface utilizing only a voice without a high-level graphical interface.
As described above, a voice interactive system is being developed in various fields, which allows a user to solve problems in the user's intended order by the use of a voice interactive interface, and which allows the ability of oral communication owned by a human to be fully taken advantage of.
FIG. 1 shows a configuration of a conventional voice interactive system. In FIG. 1, reference numeral 1 denotes a voice input part through which a user inputs voice information. As an input medium, file transfer for a web file and the like, as well as a microphone, can be contemplated. The input voice information is sent to a voice recognition part 2 through a network 7 or the like. As the network 7, various connection forms such as the Internet and WAN/LAN can be contemplated.
Reference numeral 2 denotes a voice recognition part that analyzes the contents of the input voice information. It is also considered that the voice recognition part 2 has a noise reduction function for the purpose of preventing problems in the case where the input voice contains noise.
Reference numeral 6 denotes a voice output part that outputs a response to the voice input from the voice input part 1 as a synthesized voice. An output method is not limited to an output based on a voice, and the contents of a response may be displayed on a display apparatus.
Reference numeral 3 denotes a voice information mediation part, which controls a response timing among the voice input part 1, the voice output part 6, and an interaction engine 4. Reference numeral 4 denotes an interaction engine, which refers to a knowledge database 5 in accordance with the contents of the input voice to extract the most suitable contents of a response. Therefore, the performance of the interaction engine 4 directly influences the performance of the voice interactive system.
However, in the above-mentioned conventional voice interactive system, a current voice recognition technique has not reached a level with a 100% recognition precision. Therefore, the intention of user's uttered voice may not be sufficiently recognized. Furthermore, even with the interaction engine 4 having high performance, it is impossible to completely handle human daily conversation, and it is easily expected that conversation with the contents that cannot be handled by such an interaction engine is conducted.
In the above-mentioned case, it is considered that the conversation that cannot be handled only by the interaction engine 4 may be handled by allowing a third-party user or an operator who is familiar with the contents of the conversation to participate therein as a helper.
For example, JP 7(1995)-19239 B discloses a system that allows a user to interact with an operator (i.e., a third-party user) when detecting that a particular word is contained in an input voice.
Furthermore, JP 8(1996)-76965 A discloses a voice recognition system in which a user can request mediation of an operator when the user is at a loss how to use the system. JP 10(1998)-124086 A discloses a voice interactive system that allows a system supporter to directly respond to an input voice in the case where an expert system alone cannot respond thereto.
However, according to the invention disclosed, for example, in JP 7(1995)-19239 B, the case is also assumed where operator's help is required even if a particular word is not contained in an input voice. Furthermore, it is practically difficult to extract words on the assumption of various cases. Therefore, it is actually difficult to configure such a system.
According to the invention disclosed in JP 8(1996)-76965 A, a user cannot get operator's help without the user's intention. Therefore, regarding interactive contents that would be easily determined to be useless under the supervision of a third-party user, operator's help can be provided only after a considerable period of time.
According to the invention disclosed in JP 10(1998)-124086 A, only in the case of interaction that is not contained in a rule database in an expert system, help of a system supporter can be provided. If a knowledge level of a user is not matched with a knowledge level of a rule database, even interaction contained in the rule database cannot be conducted smoothly, which makes it difficult for a user to understand the interaction.