This invention relates to a control system and method for modular, mixed initiative, human-machine interfaces. Examples of interfaces where this system can be used are interfaces controlled by speech recognition systems and information gathering web pages/browsers. A particular feature of the invention is to provide a mechanism for allowing the modular decomposition of mixed-initiative dialogues. A mixed initiative approach is where the user is not constrained to answer the systems direct questions but may answer in a less rigid/structured manner.
The problems associated with the current technology in this field are further detailed in the following description. Modern event-driven user interfaces provide a rich medium for interaction. The user often has a large set of available options which allows them to take the initiative in communicating with the machine. In many applications, however, it can be difficult to provide a large number of options without having complex and expensive devices to operate such applications. For example, speech-only interfaces, especially over the telephone, are highly constrained by the inability of current devices to accurately recognise more than a few keywords and phrases at a time. As a consequence, spoken dialogues for current commercial systems are typically implemented in fixed frameworks with strict control over the possible flow of system and user interactions, and requiring each question and answer to be explicitly scripted. This explicit control of dialogue allows the range of possible inputs at any point in time to be carefully controlled and thereby allows robust and useable systems to be built.
However, without extensive application-dependent handcrafting of the dialogues, the use of simple dialogue frameworks such as finite state networks or algorithms which fill the empty slots of frame structures results in applications that are either heavily system directed and which prevent the user from taking any initiative, or that are over-permissive of user initiative and are consequently subject to a greater number of recognition errors and unpredictable system behaviour (due to the greater perplexity of the input grammars). Applications built this way are typically inflexible and expensive to develop. Dialogues can be long and tedious as frequent users are frustrated by the need to navigate a long and immutable sequence of question and answers, or subject to an infuriating number of errors and false paths.