Speech enabled user interfaces are typically used on computing devices ranging from hand held devices to servers handling sophisticated queries and transactions. With technology progress in speech recognition and improvement in user interface designs, speech enabled applications are being used in human computer interaction. Speech based interface has thus expanded the range of use of computer systems.
One application of speech based user interfaces is speech enabled call center and customer care solutions. A user can call a particular number provided by a service provider to find information such as flight schedules, landmark locations and traffic conditions. By building the terms and possible user queries of service domains into a speech recognition system, the system can then recognize these terms and sentences regarding services of the domains. Based on recognition results, the system provides users with the relevant information asked for, or performs transactions requested. In order for users to cooperate with the system to fulfil particular tasks efficiently and smoothly, the system typically provides appropriate prompt information or guides and leads the user to follow some steps as a dialogue process between the user and the system proceeds. The process of exchanging information between humans and machines through human language is called dialogue and the process of designing software that is able to carry out human-machine dialogue is called dialogue design.
Typically, directed dialogue is used in which conversations are conducted in a rigid, computer-oriented manner. Users follow computer prompts step by step to input required items in a pre-programmed order. In contrast to directed dialogue, Mixed Initiative Dialogue (MID) has been proposed. In a MID process, a user has the flexibility to choose e.g. the order and/or the amount of information the user wants to convey, according to the user's language use preferences and experience of the interface. It will be appreciated that MID provides a more natural way of language communication between humans and machine, and therefore, MID is increasingly used in speech enabled user interfaces.
Apart from typical characteristics such as a high recognition rate and robust performance of a speech recognition engine, dialogue design is also a consideration for speech based applications. There are a number of factors that are taken into account when designing a speech enabled user interface. Typically, to build a speech enabled interface for an application, domain content and operational information are compiled in the forms of grammar that describe the sublanguage used by humans to operate the computer system. The purpose of the grammar is to narrow down the scope of the recognition tasks so that the recognition engine can achieve a higher recognition rate. Separately, a piece of dialogue software is typically designed for instructing the recognition engine to perform a speech recognition process using the grammars when receiving user utterances. It will be appreciated that dialogue design should be able to handle not only dialogue logic, but also abnormal user behaviours such as bad speech, out of domain queries, timeout actions etc. Typically, to look for a design that can meet user requirements, different user interface prototypes are implemented for usability tests across a plurality of users. This is typically time consuming. Therefore, it will be appreciated that dialogue design is not a trivial task and design environments and tools, especially for rapid development and test, are needed to alleviate the efforts to carry out such activities.
It has been appreciated that developing complicated dialogue systems is relatively difficult. A developer typically needs to consider a plurality of details of the dialogue process. Further, the developer typically needs to consider possible different inputs from users and then plan in advance, e.g. via hard-coding, the actions to be taken with the various different inputs.
Moreover, application domains and contents provided by services are typically changing frequently. These changes give rise to changes in grammars and dialogue logic. Thus, maintenance work for speech enabled applications is another consideration. Automatic or semi-automatic processes can be significantly useful for the maintenance work.
In ‘Towards the Automatic Generation of Mixed-Initiative Dialogue Systems from Web Content’ published by Joseph et al in Euro speech 03, MIT Laboratory for Computer Science, Corporation for National Research Initiatives, a number of approaches are introduced to create a dialogue system and provide a dialogue management and response planning strategy that is adaptable to on-line content, thereby improving interaction with a user. However, the parsed data is organized into subcategories based on numeric values and the paper does not describe any analysing of the use of grammars produced by a speaker during a speech.
In ‘Large-scale software integration for spoken language and multimodal dialog systems’, published by Gerd et al in Natural Language Engineering 10 (3/4): 283-305. c 2004 Cambridge University Press, German Research Center for Artificial Intelligence, a framework for large-scale software integration that results from the realization of various natural language and multimodal dialog systems is introduced. The approach relies on a distributed component model that enables flexible re-use and extension of existing software modules. However, the paper does not analyze the use of grammars produced by a speaker during a speech.
US publication no. 2006/0069547 ∝describes a method which focuses on creating grammar for alphanumeric concepts from inputs like regular expression. The grammar is parsed and generated by rules together with prefix optimization. The field of grammars is therefore significantly narrow. In particular, this document is only focusing on generating grammars from alpha-numeric concepts, which is a specific form of expression used in language. It will be appreciated that this form of generation can at best be only regarded as a process of converting alpha-numeric expressions into grammars. Furthermore, this document does not provide any teaching on dialogue generation based on generated grammars.
US publication no. 2006/0085192 describes a system for conducting a user dialog via a speech based user interface. The system includes an auditory prompt module for generating a sequence of goal-directed auditory prompts based upon pre-determined user-oriented tasks. However, the system does not analyse the use of grammars produced by a user during a speech.
Hence, there exists a need for a dialogue system for a fully mixed initiative dialogue (FMID) interaction between a human and a machine and a method for executing a fully mixed initiative dialogue (FMID) interaction between a human and a machine that seek to address at least one of the above problems.