Priority is claimed on Japanese Patent Application No. 2006-329493, filed Dec. 6, 2006, the contents of which are incorporated herein by reference.
1. Field Of The Invention
The present invention relates to a language understanding apparatus, a language understanding method, and a computer program.
2. Description Of The Related Art
Recently, there are attempts to construct systems that understand a natural language. Such a system if designed to operate very limited operations, can be constructed relatively easily by using a voice extensible markup language (VoiceXML) or the like whereas such a system if designed to be able to understand a little complicated dialogues and languages shall face various problems.
Conventionally, syntax analysis or parsing is used in understanding a natural language. However, existing software which executes parsing still suffers lower precision for use in a dialogue system, and is difficult to satisfactorily operate for a domain to be locally handled without troublesome adjustment. In addition, it takes time to construct a semantic representation generator which generates a semantic representation from the result of parsing, and has a low reusability. Without using parsing, however, dialogues cannot go beyond the level which is carried out with an enumeration of keywords.
Understanding of a natural language and two understanding systems will be described below.
FIG. 1 is a very simplified exemplary diagram of understanding a natural language. Natural language understand in the present natural language processing paradigm is generally the function of converting a natural language representation to a task representation which can be understood by a computer. That is, natural language understanding is equivalent to processing a natural language representation including a variety of ambiguities to express it with a user's ambiguity-free semantic representation (process α in FIG. 1) and generating a task representation sufficient to execute a process which achieves the user's demand from a semantic representation (process β in FIG. 1). In the present specification, the process α is called “semantic representation generation” and the process β is called “task representation generation”.
A dialogue system does not actually perform conversion of a language representation to a task representation at once, but gradually builds up semantic representations while generating/executing an auxiliary task representation to acquire supplemental information, e.g., an utterance for demanding confirmation or clarification, and generates a main task representation when sufficient information and conviction are obtained. A semantic representation is said to be an internal representation for processing a language while a task representation is said to be an internal representation for a task process.
For example, a task representation is a target state (goal) to be an input of a task planner in a robot dialogue, whereas a task representation is a SQL sentence in a database search dialogue.
Systems for natural language understanding can be generally classified into two systems depending on a difference in a system of generating a semantic representation. One is a template system called herein, which executes semantic extraction through surface matching of a keyword or a parameterized representation pattern with an input utterance. The other one is a parsing system called herein that performs parsing on an input utterance using a classifier or the like which has learned from grammatical rules and statistic data, and acquires a semantic representation from the acquired semantic tree through recursive procedures.
The template system is often used in a spoken dialogue system for it is easy to construct and maintain and realize a robuster system than the parsing system. However, the template system can handle only simple representations which can be replaced with an enumeration of keywords. The parsing system directly handles syntax recursiveness and semantic synthesis, and can therefore handle complex and multifarious language representations while it is not easy to construct and maintain and generally has lower robustness. Further, as the parsing system is premised on synthesis, it is weak on understanding a non-synthetic representation.
There is a technology of first attempting to perform parsing of the constrain-relaxation approach and generate a semantic representation through a bottom-up recursive process based on the parsing result, and, if the attempt fails, generating a semantic representation by a pattern-matching based semantic driven approach using the knowledge of a pre-given keyword pattern (see, for example. “A Robust Spoken Dialogue System Based on Understanding Mechanism of Human Being”, by Mikio Yamamoto, Toshihiko Itoh, Masaru Hidano and Seiichi Nakagawa, Transactions of Information Processing Society of Japan, Vol. 36, No. 4, pp. 471-482, April 1995: Non-patent Document 1). The constrain-relaxation approach is to perform analysis according to a grammar which parses only grammatical sentences (grammatically correct sentences), and, if the analysis fails, perform a process of relaxing the constrain to correct an error.
The parsing system and the template system which are used in the conventional natural language understanding have the following problems.
First, the problems of the parsing system will be described.
FIG. 2 is an exemplary diagram of a general parsing system. FIG. 2 is equivalent to the process α in FIG. 1. As shown in FIG. 2, with the parsing system in use, a developer should prepare at least two modules: (1) a parser to generate a semantic tree from a natural language representation and (2) a semantic representation generator to generate an internal representation or semantic representation representing the semantics of the original natural language. While a semantic frame is used as a semantic representation in FIG. 2, other semantic representation styles, such as a predicate logic and semantic network, can be used as well.
For a parser, the first module, used in the parsing system, there are two options, namely, using an existing natural language parser and creating a parser itself.
Because creating a parser from the scratch results in a large cost, this option is not so practical unless a special function the general parser does not have is needed. The use of a disclosed parser has a problem too.
Execution of parsing requires not only a parsing program but also, naturally, a grammar according to a target language and domain. Some parsers come with extensive grammars, so that direct use of the parsers can provide certain parsing results. However, those accessory grammars are created based on writing corpuses, such as news articles, and have a poor parsing accuracy in a dialogue system which handles spoken languages, particularly, in a spoken dialogue system. To reduce the load of creating a semantic representation generator, there often comes a demand of handling a phrase consisting of a domain-specific proper noun, a plurality of functional words and/or a content word, as a single word. While addition of a proper noun is relatively simple, handing a phrase is not easy.
It is possible to make a grammar, but it is not an easy work to describe grammatical rules while eliminating an interference between rules and an unintended behavior. Recently, therefore, the main stream is automatic acquisition of a grammar and machine learning of the dependency likelihood instead of manual grammatical descriptions. Because those approaches need accurate and narrow annotation supported by huge corpuses and a linguistic knowledge, however, they are not practical options in consideration of the construction of a domain-limited dialogue system.
The a semantic representation generator, the second module, used in the parsing system should be constructed according to a domain and a semantic representation style used. In general, a semantic representation generator is constructed as a collection of recursive procedures, and outputs a semantic representation as a result of a recursive process with a semantic tree output from a parser being an input.
The most significant reason for the difficulty of constructing a semantic representation generator is the inconsistency between the structure of a domain concept and a syntax structure. This problem is likely to occur when an existing general-purpose parser is used. For example, it is assumed that in a weather forecast domain a concept “weather” representing weather is defined by a place and a date. Then, a language representation “weather in Tokyo tomorrow” may be represented by a semantic representation “weather(Tokyo, tomorrow)”. Let “Tokyo” and “tomorrow” be symbols specifically indicating a place “Tokyo” and a date “tomorrow”, respective.
In a semantic representation, “Tokyo” and “tomorrow” are governed by the concept “weather”. However, the language structure should not necessarily be of the same type as the structure. In case of “weather in Tokyo tomorrow”, the result of dependency analysis shows that there are two cases: one where “Tokyo” is applied to “tomorrow” and the other where “Tokyo” is applied to “weather”, and it is difficult to previously know which one is output.
One solution to this difficulty is an approach of manually making a grammatical correction or making learning-based adjustment of a parser. As mentioned previously, however, it is difficult to manually adjust grammatical rules. In addition, allowing a parser to learn demands a vast amount of data. Creation of data demands an additional work, such as annotation, as well as collection of data, and is thus very troublesome and time consuming. Another solution is to allow a semantic representation generator to generate a semantic representation “weather(Tokyo, tomorrow)” from either structure. Generally, taking this approach needs difficult and tiresome programming to construct a semantic representation generator. Further, the constructed semantic representation generator contains a lot of exceptional processes, making the reusability in other domains lower.
“weather in Tokyo tomorrow” may also be represented as “weather in Tokyo tomorrow”. If extra phrases are inserted, like “weather in Tokyo area tomorrow” or “weather in the vicinity of Tokyo tomorrow”, semantic representation generation should not be interfered with the extra phrases. In a representation “weather in Tokyo and Osaka tomorrow”, even if “tomorrow” is applied to “Tokyo”, a semantic representation {weather(Tokyo, tomorrow), weather(Osaka, tomorrow)} or weather({Tokyo, Osaka}, tomorrow), weather(Osaka, tomorrow)} should be generated based on the interpretation that “tomorrow” is likewise applied to “Osaka”.
The foregoing problems should be handled even when a correct parsing result is obtained, but creating a robuster generator against frequently occurring parse errors demands further efforts.
According to the technology of the Non-patent Document 1, it seems that the parsing system can better execute a highly-accurate semantic interpretation of a complex representation in a range where the applied grammar covers. However, the technology of the Non-patent Document 1 cannot overcome the aforementioned problems. For example, the approach of the technology of the Non-patent Document 1 requires that a system developer should describe a parsing grammar and rules for generating semantic representations in addition to keyword patterns. Keyword patterns are used as the last resort and a parallel structure and a recursive structure cannot be idled. That is, while the technology of the Non-patent Document 1 tries to increase the robustness while keeping the parsing accuracy by the two-stage use of the parsing system and the template system, it faces the problem that keeping understandable complexity and the robustness cannot be satisfied at the same time. In other words, while robust understanding of simple representations is possible, robust understanding of certain complex representations is not possible except for omission of a particle or inversion.
The present invention has been made in consideration of the foregoing situations, and it is an object of the invention to provide a language understanding apparatus, a language understanding method and a computer program which are capable of demonstrating robust understanding of complicated natural language representations and can relieve the work of a system developer.