1. Technical Field
The present invention relates to a multimodal user interface for a data or other software system. The interface finds particular application in resolving insufficient inputs to a multimodal system.
2. Description of Related Art
Man-to-machine communications is a major business opportunity. The rapid growth in the use (and processing power) of computers both in the home and in the workplace is leading to the situation where the market for xe2x80x9cman-machine trafficxe2x80x9d is growing fast. Machine Intelligence (MI) looks likely to provide the basis for a plethora of new service offerings, not only for the world of business but also for the domestic user of telecommunications.
In many industries, information technology (IT) systems are replacing secretaries, the word processor and E-mail, and now electronic agents are often the new personal assistantxe2x80x94not people. This acceptance of software will accelerate the race to develop intelligent machines.
Intelligent software is applicable in situations where the combination of human and current computer technology is either too slow, too expensive or under strain. The following examples indicate where machine intelligence is likely to have a beneficial impact in the years to come: communications filtering, telephone answering, resource management, network management and managers"" assistance.
Research in human-computer interactions has mainly focused on natural language, text, speech and vision primarily in isolation. Recently there has been a number of research projects that have concentrated on the integration of such modalities using intelligent reasoners. The rationale is that many inherent ambiguities in single modes of communication can be resolved if extra information is available. A rich source of information for recent work in this area is the book entitled Intelligent User Interfaces by Sullivan and Tyler [Addison Wesley 1991]. Among the projects reviewed in the above reference are CUBRICON from Calspan-UB Research Centre, XTRA from German Research Centre for Al and the SRI system from SRI International.
The CUBRICON system is able to use a simultaneous pointing reference and natural language reference to disambiguate one another when appropriate. It also automatically composes and generates relevant output to the user in co-ordinated multi-media. The system combines natural language, text commands, speech and simulated gestures such as pointing with a mouse. The application area is military-based maps.
The XTRA system is an intelligent multi-modal interface to expert systems that combines natural language, graphics and pointing. It acts as an intelligent agent performing a translation between the user and the expert system. The most interesting aspect of this project is how free form pointing gestures such as pointing with fingers at a distance from the screen has been integrated with graphics and natural language to allow a more natural way of communication between the user and the expert system.
The SRI system combines natural language/speech with pen gestures such as circles and arrows to provide map-based tourist information about San Francisco.
At the heart of all above systems is a reasoner that combines the general and task-specific knowledge in its knowledge base with often vague or incomplete user requests in order to provide a complete query to the application.
To a communications company, provision of service to business and residential customers, network maintenance and fault repair are core activities of a workforce which can involve thousands of technicians every day. A fully automated system called Work Manager has been developed for managing the workforce. This is described for instant in U.S. Pat. No. 5,963,911, the content of which is herein incorporated by reference. Work Manager is capable of monitoring changes in resource and work profiles, and of reacting to them when necessary to maintain the feasibility and optimality of work schedules. An important component is the allocation algorithm called Dynamic Scheduler. The purpose of the Dynamic Scheduler is to provide the capability:
to schedule work over a long period of time,
to repair/optimise schedules,
to modify the business objectives of the scheduling algorithms, and
to provide statistics from which the schedules can be viewed and their quality assessed.
The Dynamic Scheduler is described in U.S. Pat. No. 6,578,005, the content of which is also herein incorporate by reference.
The user interface can be very important in systems like the Dynamic Scheduler, including data visualisation for the user interface. Enormous amounts of information can be produced, making the assessment of results via traditional management information systems extremely difficult. Data visualisation can summarise and organise the information produced by such systems, for instance facilitating real-time monitoring and visualisation of work schedules generated, but the sheer magnitude of information available today often makes the interface between humans and computers very important.
According to the present invention there is provided a multimodal user interface for receiving user inputs in more than one different mode, the interface comprising:
i) at least two inputs for receiving user communications in a different respective mode at each input;
ii) an output to a system responsive to user communications; and
iii) processing means for resolving ambiguity and/or conflict in user communications received at one or more of the inputs.
A user input mode can be determined primarily by the tool or device used to make the input. A mode (or modality) is a type of communications channel e.g. speech and vision are two modalities. Within the context of the embodiments of the present invention described below, there are four input modalities: keyboard, mouse, camera and microphone. Everything originating from each of these devices has the same mode e.g. the keyboard provides one mode of communication whether it is used for free text input or specific selections. However, in a different system, different modes might also encompass different usage of the same tool or device. For instance, a keyboard might be used for free text input while its tab key might be used as the equivalent of a mouse; in a different embodiment, this might actually constitute two different modes.
Preferred embodiments of the present invention allow modes with very different characteristics at the interface to be used, such as gaze tracking, keyboard inputs and voice recognition. These can take very different lengths of time to achieve an input and pose very different problems for a system in terms of accessing content in an input, to a sufficient degree of accuracy to act on it reasonably correctly.
For instance, there can be significant uncertainty in the timing of events. That is, the start and end time of mouse or camera events could be fuzzy and the relationship between the starts and ends of two events could also be fuzzy (positive or negative). Therefore the temporal relationship established on the basis of these relationships is fuzzy too.
Embodiments of the present invention can provide a particularly flexible system which can handle approximate timings of events of several different types. This property offers an important level of tolerance to the inherent variability with which multiple users operate a system.
An important aspect of embodiments of the present invention is the temporal reasoner. Such a temporal reasoner could be used with other environments than a multimodal interface for human users since there may be other requirements for measuring temporal separation to determine a relationship between events having start and end times. The temporal reasoner can be broadly expressed as follows, together with the method it carries out.
A temporal reasoner comprising:
i) means for receiving start and end time data for a pair of events,
ii) means for calculating temporal separation of the start times and the end times for said pair,
iii) means for applying a broadening function to each calculated temporal separation,
iv) means to categorise each broadened temporal separation into preselected categories, and
v) means to determine whether the pair of events is related or not related, based on the resultant categories for the broadened temporal separations.
This temporal reasoner can be used in an interface for receiving inputs having start and end times, the pair of events comprising two such inputs, the interface comprising means for measuring the start and end times of the two inputs to provide the start and end time data to the temporal reasoner. Further, said categories may usefully comprise negative, zero and positive, as described above for the multimodal interface.
A method of temporal reasoning can be described as comprising:
i) receiving start and end time data for a pair of events,
ii) calculating temporal separation of the start times and the end times for said pair,
iii) applying a broadening function to each calculated temporal separation,
iv) using rules to categorise each broadened temporal separation into preselected categories, and
v) using further rules to determine whether the pair of respective user communications is related or not related, based on the resultant categories for the broadened temporal separations.