1. Field of the Invention
The present invention relates to a system and method for compiling rules created by machine learning programs and, more particularly, to a method for compiling rules into weighted finite-state transducers.
2. Introduction
Many problems in Natural Language Processing (NLP) can be modeled as classification tasks, either at the word or at the sentence level. For example, part-of-speech tagging, named-entity identification supertagging (associating each word with a label that represents syntactic information of the word given its context in a sentence), and word sense disambiguation are tasks that have been modeled as classification problems at the word level. In addition, there are problems that classify an entire sentence or document into one of a set of categories. These problems are loosely characterized as semantic classification and have been used in many practical applications including call routing and text classification.
Most of these problems have been addressed in isolation assuming unambiguous (one-best) input. Typically, however, in NLP applications, modules are chained together with each of the modules introducing some amount of error. In order to alleviate the errors introduced by a module, it is typical for the module to provide multiple weighted solutions (ideally as a packed representation) that serve as input to the next module. For example, a speech recognizer provides a lattice of possible recognition outputs that is to be annotated with part-of-speech and named-entities.