The present invention pertains to a method and an arrangement for conversion between data representation formats, said data comprising sound or text information. The invention specifically provides for a method and arrangement means for word and sound processing.
Natural language understanding has been the topic of research since the first days of Artificial Intelligence. The present invention is primarily intended for understanding spontaneous utterances, in written or spoken form, within a limited domain.
One current approach to this problem is to model a dialog flow for each operation that can be performed in a specific system, dividing each dialog into modes. For each mode, valid inputs and their consequences are listed. For example, the Philips SpeechMania(copyright) 99 product has been demonstrated with a pizza ordering application, where a user goes through dialog modes involving for instance selecting pizza toppings. A disadvantage of this type of technology is that the system will only understand the utterances expected in a given mode. If a user changes his drink order while he is expected to select pizza toppings, the system may fail to understand this. The degree to which the system xe2x80x98understandsxe2x80x99 the utterances in this kind of iteration is limited; each mode and the utterances valid therein must be anticipated by the developers, and directly related to the action the system takes as a response to user input.
Other speech recognition systems, such as those using the Java(trademark) Speech Grammar Format (JSGF), provide tags attached to an (often handwritten) grammar. The tags normally have some semantic meaning, regardless of how that semantic meaning was expressed. Words or phrases without semantic interest to the application (politeness phrases, articles, etc.) are ignored. Such an application then has a very simple parsing of the tags in order to act on the speech input. This requires manual adaptation of the application and grammar so that they work together, and cannot be said to xe2x80x98understandxe2x80x99 the spoken utterances.
More advanced approaches to natural language understanding make use of formalisms developed within the field of linguistics and computational linguistics. One currently popular formalism is Head-Driven Phrase Structure Grammars, which associates groups of lexical features with words, resulting in a grammar structure which can be used for parsing general natural language. Many of these linguistic formalisms could be used to perform some of the steps described in the present invention, but require much more work to be integrated into a complete language understanding interface to an application, and also substantial adaptation to new domains.
Some speech recognition systems use word spotting. This entails listening for certain key words and ignoring the rest of the spoken utterance. This may simplify the parsing component of a system, but does not allow the system to understand the details of the user""s utterance.
No commercial applications use the same grammar for both natural language generation and natural language understanding. Most current applications either understand a very simple subset of natural language, or require substantial manpower to adapt the natural language understanding system to a given application.
The present invention sets forth a method and an arrangement for word and sound processing. It solves the problem of simple natural language understanding, allowing users to interact with (for instance by giving commands and asking questions) machines using natural language, for instance in spoken or written form. Additionally, the method provides language independence by transforming linguistic utterances by the user to a semantic form which is independent of the original language used. This form may later be converted into another human language, thereby resulting in a simplistic translation. Furthermore, the present invention can automate the process of adapting domain specific natural language understanding to an application of the same.
The present invention does not attempt to solve the highly complex problem of general natural language understanding, but rather the understanding of a limited subset of natural languagexe2x80x94utterances which can be straight forwardly mapped to the domain of one or several data models or computer applications.
To achieve these aims and objectives, the present invention provides a method for conversion between data representation formats, where the data includes sound or text information. The data representation formats are text, sound, words, phrases, and logic. Those are combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa. The formats include a string of characters in the text format, a digital representation of an acoustic wave form in the sound format, a reference to a data structure containing information about a word in the word format, a tree-like representation of grammatical structure of a phrase in the phrase format, where the leaf nodes of the tree-like representation are referenced to meanings of constituent words and conjunction information, and references to function, objects and attributes in an underlying data model in the logic format. The method includes the steps of converting text to words by using characters with separate words, converting words to text by concatenating the spelling of the constituent words, converting sounds to words by providing a continuous speech recognition system, converting words to sounds by providing a speech synthesis system, converting words to phrases by parsing, converting phrases to words by transversing the tree-like representation preferably from left to right and converting each leaf node to a word, converting phrases to logic by resolving or binding verb phrases to functions and noun phrases to objects in the underlying data model and converting logic to phrases by using knowledge of the grammar of the language used to create a phrase expressing the same semantics as in the original logic format.
The method further comprises:
converting text to words by using characters which separate words;
converting words to text by concatenating the spelling of the constituent words;
converting sound to words by providing a continuous speech recognition system;
converting words to sound by providing a speech synthesis system;
converting words to phrases by parsing
converting phrases to words by traversing said tree-like representation preferably from left to right and convert each leaf node to a word;
converting phrases to logic by resolving or binding verb phrases to functions and noun phrases to objects in said underlying data model;
converting logic to phrases by using knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form; and thus providing a computer word processing and sound processing means.
The invention here also includes the apparatus for accomplishing the conversion. There are means for converting text to words, words to text, sounds to words, words to sounds, words to phrases, phrases to words, phrases to logic and logic to phrases, typically in form of software or a combination of hardware and software.
The arrangement further comprises:
converting means from text to words, which uses characters which separate words;
converting means from words to text, which concatenates the spelling of the constituent words;
converting means for sounds to words, providing a continuous speech recognition system;
converting means for words to sound, providing a speech synthesis system;
converting means for words to phrases, which uses parsing;
converting means for phrases to words, which traverses said tree-like representation preferably from left to right and converts each leaf node to a word;
converting means for phrases to logic, which resolves or binds verb phrases to functions and noun phrases to objects in said underlying data model; and
converting means for logic to phrases, which uses knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form.
Further embodiments of the present invention are set out through the attached dependent claims. Also, said arrangement is able to provide the embodiments relating to said method.