The present invention deals with natural language processing. More specifically, the present invention relates to an interface and associated object model for providing integrated natural language processing services to an application.
Natural language processing involves the processing of a natural language input. A natural language input is generally language used by a person (as opposed to a computer language or other artificial language), including all of the idioms, assumptions and implications of an utterance in a natural language input. Natural language processing implemented by a computer is typically an attempt to determine the meaning of a natural language input such that the natural language input can be “understood” and/or acted on by the computer.
In many prior natural language processing systems, natural language processing features have been provided by individual natural language processing components. For instance, such individual components have included language detection which is a process that attempts to detect a language that a natural language input is provided in, word breaking that attempts to identify individual words in a natural language input, spell checking, grammar checking, etc. In conventional systems, each of these components is provided separately. This leads to a number of difficulties.
For instance, a word-breaking component must provide its output in a manner which is consistent with the spell checking component. Similarly, the spell checking component must provide its output in a format acceptable to any downstream processing components. Of course, since these individual natural language processing components are often provided by different vendors, they often do not work seamlessly with one another, without substantial reformatting of the processing results so that they are suitable for the next downstream component.
Similarly, prior natural language processing systems have been configured to provide natural language processing features in a single language only. Therefore, if a developer was working on a multi-language application, the developer would likely need to send a natural language input (to be processed) first to a language detection component, and then call the appropriate natural language processing components, given the identified language.
Further, many of the natural language processing components required a lexicon. In order to add a lexicon, prior natural language processing systems often required the lexicon to be added to multiple individual components. The lexicon might have different format for different components or be incompatible between components. In other words, adding a lexicon was not enabled across multiple components or features.
All of these difficulties, presented as a result of providing individual natural language process components, have greatly benefited application developers that have extensive knowledge about a wide variety of individual components provided by different vendors. This has rendered the development of applications that utilize natural language processing very difficult, costly, and time consuming, and has hindered its widespread dissemination in computing.