The present invention generally relates to a machine interface allowing a user to interface to a machine by entering into a dialogue with the machine.
A method of interfacing the user with a machine which uses speech recognition has been developed in the prior art and is termed a spoken dialogue system (SDS). A spoken dialogue system is a system which interacts with a user by using speech recognition to recognise what a user says and perhaps performs further analysis of the utterance to understand what it means. The system can communicate information back to a user to, for example, indicate what state it is in or to give information the user has asked for. This may be done, for example, using a display, speech synthesis, or by playing back recorded speech.
The creation of an SDS is often very expensive because it is bespoke and made by specialists. It is preferable to have a reusable SDS which can be used for many applications. Such a system can be termed a generic SDS. Such a prior art SDS is a voice browser which executes a mark-up language such as VoXML or VoiceXML. However, because it is very general, it has limited functionality.
In order to implement a generic SDS, an instruction set is required in order to determine its behaviour. VoXML and VoiceXML are examples of such specifications which tell a voice browser what to do. A more specialised generic SDS can have more functionality but will still require some form of specification that it can execute.
In the prior art there are an increasing number of documents which are marked-up with some special language that the reader of the document normally does not see directly. Examples of these types of documents are html (which is used for things such as layout by browsers), and a mark-up language called TranScribe III which is used for Canon manuals. In the case of TranScribe III, the mark-up language is used for typesetting by printers and for supporting translators in their work when translating manuals. Other types of mark-up languages are SGML and XML. All mark-up languages have the following characteristics in common:
1. They contain information about the structure or presentation of the document which can be, for example, used for determining the layout of the document or for searching for information in the document.
2. They are normally not directly visible to the reader of the document but have an effect on aspects such as layout or search functions.
In generic spoken dialogue systems there are special and different types of mark-up languages such as VoXML and VoiceXML. These allow a developer who wants to make a dialogue system to use a xe2x80x9creusablexe2x80x9d generic dialogue system and hence just write a dialogue specification without doing any programming, or doing only a small part of the development work. They do not, however, serve to mark-up a text which is intended to be readable for humans.
Automatic conversion of text documents, e.g. in a mark-up language, to an SDS specification to provide a spoken dialogue system interface between the information content of the document and the user is known in the prior art.
The present invention provides a machine interface in which passages comprising discrete information segments in a document are identified and converted into separate dialogue instruction sets. Scores, or more specifically probabilities, for words appearing in each passage of the text are determined and stored. The scores can then be used to determine a combined score for words input by a user in order to try to identify a passage which relates to a user""s input. For a passage which is identified as having the highest combined score, the corresponding dialogue instruction set can be executed by the dialogue system.
Thus in this way large documents which are not suitable for straightforward conversion into a dialogue specification can be converted in a segmented manner allowing the user to enter into a dialogue with the system in order to identify the appropriate dialogue specification segment which should be executed.
In the prior art when a document consists of many parts which can easily be scanned through visually but which would take far too long to read out one by one, an automatically generated dialogue system would provide an inferior interface between the information content of the document and the user. Although the original document may have an index to assist the reader in finding what they want, this is not applicable after a conversion to a dialogue specification. The index for the manual, for example, refers the user to a page number which in the dialogue system has no meaning. The dialogue system is likely to ignore page breaks of the manual, making the page numbers useless within the dialogue specification.
The present invention is particularly suited to a spoken dialogue system. However, the present invention is not limited to the use of speech as a means of interfacing a machine and the user. Any means by which the user can enter words into and receive words from a machine is encompassed within the scope of the present invention. For example, the user input may be speech but the output may be a display or vice versa. Alternatively, simple keyboard input and a display response can be used.
The present invention is applicable to any form of document, not just a marked-up document. The passages of text can be identified using any distinctive characteristics of the document such as page breaks, headings, fonts, paragraphs or indents. With the proliferation of the use of marked-up text, the present invention is particularly suited to such texts since the mark-up is likely to facilitate greatly the identification of specific pieces of information.
The passages in the text document can represent a procedure in, for example, a machine manual, or they can comprise logically distinct segments of information.
In one embodiment of the present invention, instead of simply executing the dialogue instruction set which has the highest combined score, in order to confirm that the identified dialogue instruction set represents the information the user is seeking, the interface can generate a user prompt which corresponds to the passage and hence the dialogue instruction set. If a user responds positively to the prompt, the dialogue instruction set is executed. If the user, however, responds negatively to the prompt, the score for the passage and hence the dialogue instruction set is reduced and a prompt for the dialogue instruction set having the next highest combined score is output to the user. This process is repeated-until a dialogue instruction set is executed, or a user enters a different instruction or request in which case the dialogue system restarts the search for a likely dialogue instruction set which has the highest score.
In order to simplify the search procedure for an appropriate dialogue instruction set, in one embodiment, non-information bearing words input by a user such as xe2x80x9cthexe2x80x9d, xe2x80x9caxe2x80x9d, xe2x80x9chowxe2x80x9d, xe2x80x9cdoxe2x80x9d, and xe2x80x9cIxe2x80x9d are ignored when determining the combined scores. Thus only those words are selected which are distinctive and which will help in the identification of a dialogue instruction set.
The present invention is applicable both to a machine interface in which the dialogue instruction sets and word scores have been predetermined, and to a separate configuration machine for generating the dialogue instruction sets and word scores. In accordance with the present invention, the dialogue instruction sets and the word scores corresponding to the dialogue instruction sets can be generated separately to the interface and provided thereto.
The interface of the present invention can be implemented on any programmable machine controlled by a processor by providing a suitable processor implementable instruction code to the machine. The present invention thus encompasses a processor implementable instruction code for controlling a processor to implement the method and a carrier medium for carrying the processor implementable instruction code, e.g. a storage medium such as a floppy disc, CD ROM, random access memory, read-only memory, or magnetic tape device, and a carrier signal such as an electrical signal carrying the code over a network such as the Internet.