1. Field of the Invention
The present invention relates to named-entity detection, and more particularly to an apparatus and method for detecting a named-entity based on a gradual learning technology regarding voice recognition or language processing.
2. Description of Related Art
In general, a named-entity refers to a classifiable word or series of words, such as, for example, the name of a person, an organization, a song, a broadcast, or a location.
For example, in the case of a sentence “Could you play the Lord of the Rings?”, “Lord of the Rings” is a named-entity.
Named entities are frequently found in daily life. When discussing traffic information, about 74% of user utterances correspond to named-entities, and so do about 44% of broadcast utterances.
Particularly, named-entity detection is important in the field of knowledge learning related to spoken languages, and a large number of algorithms for named-entity detection have been proposed.
The most basic method for named-entity detection is based on a dictionary.
Here, a number of named-entities are pre-stored. A word or a group of words that are potential named entities are extracted from an inputted sentence and compared with the pre-stored named-entities.
However, named-entities have the characteristics of an open class, i.e., they vary over time. That is, they are created and disappear over time. Therefore, the conventional method based on a dictionary cannot fully process named-entities, which frequently change.
In an attempt to solve this problem, a method of detecting named-entities based on statistical techniques has been proposed. For example, U.S. Pat. No. 6,052,682 discloses a method of recognizing and classifying named-entities based on “uni-gram” and “bi-gram” by using a multi-step hidden Markov model.
However, the method disclosed in the above patent needs a corpus, which has a large amount tags attached thereto, for learning. This makes it difficult to reflect various colloquialisms, such as abbreviations. In addition, re-learning is necessary.