Concept knowledge consists of a group of terms (words) describing a category of events, ideas, objects, actions, or intentions. For example, concept knowledge relevant to “to have in mind/plan” includes terms as “plan”, “hope”, “want”, “prepare”, “mean”, “will”, “figure”, “think”, “require”, “long for” etc., which describe an intention to do something. Concept knowledge is widely used in many applications, such as information retrieval, natural language processing, machine translation, thesaurus construction and the like.
When people retrieve information in a particular field, their concept knowledge about this field directly affects their searching behavior. A person mastering relatively complete concept knowledge is capable of constructing proper search queries, preparing a plurality of synonyms for search query terms, changing his strategies properly when his first attempt is unsuccessful and identifying relevant retrieved information correctly. Thus, experts in a particular field can get higher success rates than those who know little concept knowledge about this field when retrieving information.
At present, the Internet has become a worldwide information source and principal business tool. The most direct and most convenient approach to search information on the Internet is to use a search engine. Most of the existing search engines employ keyword-based search systems. Concept knowledge can help users find desired useful information among a great deal of various data, and help users construct, modify, improve search queries, automatically expand their search.
In addition, in natural languages processing (NLP), statistical natural language processing method determines the likelihood of word features (e.g. word combination) based on frequencies of these words in a training corpus. When the frequency of a word does not warrant reliable maximum likelihood estimation, the word's probability can be computed as a weighted sum of the probabilities of words that are similar to it. For example, the statistical natural language processing method determines the likelihood of a word combination from its frequency in a training corpus. However, due to the nature of the language itself, many word combinations appear with low frequencies in a given corpus, or do not appear at all, this situation is called as a problem of data sparseness in statistical natural language processing. In such a case, “the most similar” word in concept knowledge may be used to estimate the probability of the word combination that appear with low frequency or do not appear.
Besides, in machine translation, such as a corpus-based machine translation system, if a word W need to be translated, a possible translation for W may be selected from a set of similar words in the same concept knowledge appeared in the same context.
Due to such wide applications of concept knowledge, efforts have been made to develop methods for acquiring concept knowledge. At present, the method for extracting terms belonging to a particular concept is using a lexical knowledge base, such as WordNet, to extract concept knowledge. However, such lexical knowledge bases are typically designed for general purposes. For a particular application domain, the concept knowledge provided by such lexical knowledge bases is not sufficient in details, and thus cannot satisfy particular requirements in practical applications.