Conventional technologies of generating a knowledge graph based on an extracted relation from structural and non-structural documents will be described below.
Firstly, there is a technology of modeling knowledge in the form of ontology. In this conventional technology, different kinds of centrality computation methods are proposed, and it is given how each method describes the core contents and structure of ontology. Further, it is described from a simple degree centrality to a complicate eigenvector centrality. This conventional technology presents a method of statistically analyzing a characteristic of network structure of ontology as a knowledge base.
Secondly, there is a property information extracting technology of an ontology instance through the hierarchy of domain knowledge. This technology is an algorithm of extracting properties of ontology instances from structural information already existing in web documents. In particular, a property extraction algorithm is improved by the hierarchy of the domain knowledge that is composed of property information, and thus the quality of extraction results becomes better. This conventional technology is to extract knowledge-based information from structured documents.
Thirdly, there is a technology used by computers to develop ontology in the text composed of natural language. In this technology, text data is received, and syntax and meaningful words are extracted from the text by a grammar analysis of the received data, and in the respective meaningful words of text, the definition sentence of a word is searched from an electronic dictionary. Further, syntax of the definition sentence and meaningful words are extracted, basic vocabulary graphs of the definition are generated based on the syntax of definition and the meaningful words, and at least two of the basic vocabulary graphs generated as a function of syntax of text are integrated to produce at least one semantic graph of text. In this text ontology development technology, relation between words represented in documents is not directly extracted, and relation among words is represented using the procedure of integrating graphs of word units. Also, a vocabulary similarity of entity is used when integrating the extracted relations into the knowledge graph.
Fourthly, there is a knowledge-based semiautomatic establishing technology for an encyclopedia question and answer system. In designing the knowledge-based structure, a concept-centered systematic template is designed based on the contents of encyclopedia, and important fact information relating to head words is automatically extracted from the summary information and text of encyclopedia. Then, a knowledge base of the question and answer system is established semi-automatically. The knowledge-based structure is designed based on templates for respective head words and their related properties, and the head words and their property names and property values are extracted from summary information of encyclopedia. Property names and property values for the head words are extracted from the text of encyclopedia on the basis of a dependency relation of a phrase-unit token sequence derived from sentence analysis, and structural information and non-structural information extracted per respective head word are stored in knowledge-based corresponding template and corresponding property. Therefore, the knowledge base is established. In this conventional technology, various kinds of qualification values for entry names in an encyclopedia entry are extracted from the entry text of encyclopedia and the summary information is produced.
Fifthly, there is a technology of automatically establishing ontology from non-structural web documents. This technology is to extract relation between concepts from non-structural documents and to automatically establish the ontology. Ontology instance, which is composed of relation between the concepts, is automatically extracted using pattern automatic learning and pattern automatic extending methods from non-structural web documents existing in the Internet and diverse information of database. Therefore, expenses necessary for establishing and managing ontology are reduced, and an information extraction performance for the establishment of ontology is continuously improved.
Sixthly, there is a technology of generating a knowledge graph by extracting the relation between terms. Further, a probability value is given to each of relations. Knowledge graphs and structured digital abstracts (SDAs) provide digitalized abstracts of texts. Terms for knowledge grapes and their relation are automatically extracted, and various kinds of methods and systems for the formation and visualization of knowledge grapes are provided. These graphs and abstracts can be used limitedly, but usefully, in various application systems, such as a semantic-based search for an electronic medical record search system, a specialized search for a specific domain such as newspaper, economy and history, and general Internet searches, etc. Such conventional technology is to represent in a graph structure by extracting relation between entities from the text.
Seventhly, there is a technology of establishing domain expert ontology to interpret the policy. A phrase decided as relating to the policy is received as an input. Indefinite terms are identified from the phrase. An Internet search is conducted using the indefinite terms extracted from the phrase. Latent substitute terms for the indefinite terms are extracted during the Internet search. Context-specialized ontology for the indefinite terms is generated based on the frequency number of the latent substitute terms. The policy is interpreted by accessing the domain expert ontology to interpret the indefinite terms. The indefinite terms are decided in their meaning by mapping the indefinite terms to the latent substitute terms included in ontology for contexts, and the policy is generated based on the interpretation for the indefinite terms from the ontology.
The conventional ontology establishing technologies as mentioned above are problematic in that they limit a target text to a specific domain, or limit text to structural information such as an entry of encyclopedia or a table of webpage, or simply integrate the relation extracted in a methodic aspect by using a vocabulary similarity of individuals or statistically analyze a graph structure of ontology.