1. Field of the Invention
Embodiments of the invention generally relate to the field of natural language processing and may have various applications in such areas as electronic dictionaries, syntactic analysis, automated abstracting, machine translation, natural language processing, control systems, information search (including on the Internet), data retrieval, computer-aided learning, spelling check system, semantic Web, computer-aided learning, expert systems, speech recognition/synthesis and others.
2. Description of the Related Art
The ability to understand, speak, and write one or more languages is an integral part of human development to interact and communicate within a society. Various language analysis approaches have been used to dissect a given language, analyze its linguistic structure in order to understand the meanings of a word, a sentence in the given language, extract information from the word, the sentence, and, if necessary, translate into another language or synthesize into another sentence, which expresses the same semantic meaning in some natural or artificial language.
Complex natural language texts and constructs can be analyzed and translated from one language into another. Most natural language processing systems may involve the use of electronic dictionaries, syntactic analysis, automated abstracting, machine translation, information search, etc., and in all of these applications, a linguistic morphological component is required. This linguistic morphological component may contain, among other things, a morphological model (e.g., word inflexion rules and word formation rules) and a morphological dictionary.
Except for isolated languages (e.g., Chinese, etc.), morphological structures of most natural languages with word formation are usually available, whereas the realization of a morphological model and the use of such a morphological model to construct a morphological dictionary may vary. Known morphological models are oversimplified and differ in the degree of accuracy and the completeness of their morphological descriptions, and prior morphological dictionaries are usually not comprehensive (not exhaustive).
For example, some morphological models may concern only possible word endings (e.g., affixes, suffixes, etc) in a language and may not include any inflexion rules at all. Such morphological models can only be used in data retrieval or searching systems without a need for an exhaustive morphological dictionary. These morphological models, however, often results in many errors, wrong words and wrong word forms during language analysis. Such morphological models generally can not work for languages with internal inflexion or alternation. When a morphological dictionary is considered, prior morphological dictionaries often do not concern all possible inflexion rules or word endings. This is partially due to the fact that creating a morphological dictionary that keeps all possible word forms is a huge task and often time the morphological dictionary is extremely inefficient in real-time language analysis.
Accordingly, there exists a need for a method and system for creating an effective morphological model and generating natural language dictionary.