1. Field of the Invention
The present invention relates to an information processing apparatus furnished with a user interface adaptable, illustratively, to a computer or an electrical device, or to any variety of industrial machinery. The present invention relates more particularly to a dialog interface system that allows a human and an inanimate entity (hereinafter a "thing" or "things"), as well as a thing and a fellow thing, to converse by way of speech.
2. Description of the Related Art
A remote control interface utilizing a remote control is pervasive in today's field of electrical appliances, of which, it may be said, the television and the video recorder/player are representative. Although the remote control may be suitable to tasks involving simple functions like changing television channels or adjusting television volume, it is unfit for more complex tasks. In fact, it cannot be said that even engineers, not to mention novices and the elderly, have completely mastered the use of electrical appliances comprising various and advanced functions.
In the computer field, in which more sophisticated tasks are required, graphical user interfaces (hereinafter "GUI") utilizing windows and a mouse are widely employed. The specifications differ for each product, however, and there remain many inconveniences for those who use these products.
Further, because the GUI itself represents a task with which novices and the elderly have little experience, there are for them many difficulties. Also, because the GUI by its very nature requires a screen, it is not well-suited to many types of electrical appliances and industrial machinery that do not require a display.
For these reasons, speech recognition and synthesis technologies, which are natural to humans and require no display, recently have received attention as a next-generation user interface for these systems. These technologies are already employed presently in, illustratively, car navigation systems and some computer systems.
In research regarding interfaces, moreover, many have recently taken notice of the multi-modal interface (see, for example, R. A. Bolt, "The Integrated Multi-modal Interface", (invited paper) IEICE Transactions on Information Systems, vol. J70-D, no. 11, pp. 2017-2025, November 1987; Katashi Nagao, "Multimodal Human-Computer Interaction: Agent Oriented and Real-World-Oriented", (in Japanese), Journal of the Society of Instrument and Control Engineers, vol. 35, no. 1, pp. 65-70, January 1996; Tsuneo Nitta, "From GUI Multi-modal UI (MUI)*, (in Japanese), Journal of Information Processing Society of Japan, vol. 36, no. 11, pp. 1039-1046, November 1995.)
This research seeks to facilitate dialog with, for example, a computer by using not only GUI-like visualization, but also speech or gestures and other illustrative multi-modal aspects.
On the other hand, computers are connected to a variety of things found in the environments in which humans live. Research is being conducted regarding ubiquitous computing, which seeks to assist, among other things, human actions (see, for example, M. Weiser, "Some Computer Science Issues in Ubiquitous Computing", Communications of the ACM, vol. 36, no. 7, pp. 74-85, July, 1993; Katashi Nagao, "Real-World-Oriented Human-Computer Interaction: A Survey", (in Japanese), Systems, Control and Information, vol. 40, no. 9, pp. 385-392, September, 1996).
This would entail, by way of example, computers becoming ubiquitous on, illustratively, library shelves, thus endeavoring to support searching books by way of communication with a mobile computer or according to speech.
In respect of voice research, moreover, research relating to the interaction between computers and humans has been undertaken with great fervor in recent years (see, for example, R. Cole, L. Hirschman, L. Atlas, M. Beckman, A. Biermann, M. Bush, M. Clements, J. Cohen, O. Garcia, B. Hanson, H. Hermansky, S. Levinson, K. McKeown, N. Morgan, D. G. Novick, M. Ostendorf, S. Oviatt, P. Price, H. Silverman, J. Spitz, A. Waibel, C. Weinstein, S. Zahorian, and V. Zue, "The Challenge of Spoken Language Systems: Research Directions for the Nineties", IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 1-21, January 1995; Katunobu Itou, "Speech Dialog System", (in Japanese), The Institute of Electronics, Information and Communication Engineers, Technical Report, vol. 92, no. 127, Voices, no. SP92-38, pp. 23-30, July, 1992; Yoichi Takebayashi, "Human-Computer Dialogue using Multimedia Understanding and Synthesis Functions", (in Japanese), The Institute of Electronics, Information and Communication Engineers, Technical Report, vol. 92, no. SP92-37, pp. 15-22, July, 1992).
Systems that utilize the above-described conventional speech recognition and synthesis technologies generally presume communication between only one human and one thing.
It is predicted that the number of systems with speech recognition and synthesis functions will continue to increase. When these systems are plural and are mixed with humans, moreover, it is predicted that a plurality of systems will execute in error a command issued to one system, and that a response issued to a human from one system will be mistakenly interpreted by a different system as a command issued to the latter system.
The present invention was conceived in consideration of the circumstances described above. The present invention aims to provide an interface system that is at once responsive to humans and practical and, further, that facilitates dialog by speech not only among humans, but also among things, free of mistaken operations, and without requiring a display.