1. Technical Field
The present invention relates to natural language understanding and, in particular, to name entity extraction. Still more particularly, the present invention provides a method, apparatus, and program for name entity extraction using language models (N-gram and/or finite-state language model).
2. Description of Related Art
The task of name entity extraction is to retrieve name entities from text documents or natural language utterances. Much research has been done in this area. Existing techniques for name entity extraction include, for example, rule based methods, decision tree approach, Hidden Markov model (HMM) technique, Maximum Entropy approach, and Finite-state transducer.
Most of the currently existing name entity extraction methods rely on a model that is trained or developed based on a large amount of manually annotated text data. These techniques have several problems. A sufficient amount of text data needs to be collected, which is expensive in both time and money. Also, data annotation requires substantial amount of time and human effort. Usually, a good model requires the training data to be consistently annotated, which in practice is a luxury requirement, especially when multiple persons participate in the annotation simultaneously and when the annotation process spans over a long period of time. Furthermore, the collected text data is usually domain-dependent and so is the trained model, which could not be reused or easily applied to other domains of applications. These problems make the development work very expansive and the development process very long even for a simple natural language application, such as a mutual fund trading system or air travel system.
Therefore, it would be advantageous to provide an improved name entity extraction technique that provides accurate results without excessive language model training efforts.