The present invention relates to machine understanding of textual or speech inputs. More specifically, the present invention relates to the task of information extraction in the machine understanding process.
Natural language user interfaces to computers attempt to allow the user to operate a computer simply by inputting commands or directions to the computer in a natural language. Natural language user interfaces can make computers easier to use. Such interfaces (such as spoken language interfaces) are sometimes one of the only interfaces practicable as opposed to other traditional methods of input, such as keyboards and mice. For example, the spoken language interface may be the only practicable interface in scenarios such as hands busy applications or eyes busy applications (when the user is driving for example), for people with disabilities or where the size of the device needs to be very small in order to be usable (such as cell phones and personal digital assistants-PDAs). In natural language interfaces, the user speaks or otherwise interacts with the computer (which can be a PDA, a desktop computer, a telephone, etc.) and asks the computer to carryout certain actions. In order to operate properly, the computer must understand the intentions that the user has expressed. The process of attempting to understand what the user has expressed is commonly referred to as natural language understanding (NLU) or, if the input modality being used by the user is speech, the process is referred to as spoken language understanding (SLU).
An important step in the understanding process involves extracting fragments of information from the utterance (or textual input) and associating these fragments with the concepts in the task which the user is attempting to have the computer perform. This step in the process is commonly referred to as information extraction.
Take as an example a user input sentence (where the user says or types or handwrites) “Schedule a meeting with John Smith on Saturday”. An information extraction process will hopefully identify the task requested by the user as that of dealing with meetings (as opposed to emails, for example). The information extraction process will also desirably associate the phrase “John Smith” with the concept of “meeting attendee” and the word “Saturday” with the concept of “meeting day”.
Current approaches used for information extraction require handwritten grammars, usually context free grammars (CFGs). Development of a CFG requires domain expertise, and expertise in grammar authoring. It is an iterative and time consuming process that requires grammars to be written using a combination of knowledge and data, and then tested and refined using test data. Thus, the current approaches can tend to be not only time consuming, but quite costly.