Natural language processing (NLP) systems attempt to reproduce human interpretation of language. NLP methods assume that the patterns in grammar and the conceptual relationships between words can be articulated scientifically. NLP systems require the determination of ontological relations among words or terms in a document. With respect to NLP systems, ontology refers to the explicit specification of the representation of objects in a phrase, and the relationships between them. In general, ontological relations comprise such relations as hypernym and meronym relations between two terms. Ontological relations are very important for natural language processing (NLP) applications such as question answering, information retrieval, dialogue systems, semantic inference, machine translation and other similar applications.
Traditionally, prior art methods to obtain lexico-syntactic patterns in spoken utterances apply open-domain syntactic analysis techniques. This method, however, does not work for manual data, such as documents or written text data with specific types of data content that have set relationships. Prior studies have pointed out that a set of lexico-syntactic patterns indicate hypernym relations between noun phrases (NP). Examples of such patterns are: “such NP as {NP,}* {(or|and)} NP” and “NP {,} including {NP,}* {(or|and)} NP”. Such an approach may be able to successfully extract hypernym relations, but it generally cannot extract part-whole relations because of the ambiguous meronymic context (e.g. cat's paw and cat's dinner).
Other methods have used hypernym relations as semantic constrains to extract part-whole relations. Such techniques have generally achieved some level of success with respect to precision and recall, but only for the following three types of patterns: “Y verb X,” “Y's X” and “X of Y,” where X and Y are nouns. Another prior art method has combined coordinate term identification and dependency path methods to automatically find hypernym relations in large documents, such as a large news corpus. A sample dependency path, “N:PCOMP-N:PREP, such as, PREP:MOD:N” in this technique is equivalent to the pattern “NPY such as NPX” in other methods. These dependency paths resemble lexico-syntactic patterns but cover long-distance dependencies.
These present prior art ideas share the same limitation of only using complete lexico-syntactic patterns. They do not use partial or generalized patterns, as well as complete patterns. Furthermore, these prior art technologies are generally used in data sources such as news and encyclopedia sources where there is no known set of terms. They do not generally make use of the terms available in certain documents, such as manuals.
In general, present ontological determination systems do not attempt to identify both hypernym and part-whole relations from documents or any type of manual text. In addition, toolkits (e.g., part-of-speech taggers and parsers) and resources used in these systems are not targeted at manual data.