1. Technology Field
The present invention relates to an apparatus and method for searching information based on WIKIPEDIA®'s contents (hereinafter, “Wikipedia” is used as a short form for “WIKIPEDIA®”).
2. Description of the Related Art
Wikipedia is collaboratively edited on-line encyclopedia and Wikipedia-based high quality contents are rapidly globally growing. Information extracted from the Wikipedia is utilized in a variety of applications of knowledge services.
YAGO (Yet Another Great Ontology) is a huge knowledge base standardized based on entity information of the Wikipedia, category information of entities, and info-box information of entities. DBPedia is a knowledge base standardized based on info-box information included in each entity of the Wikipedia. A NAGA system is a question and answering service which provides an answer by extracting answers to a user's natural language question from the YAGO knowledge base. A WATSON question answering system provides an answer to a user's natural language question by analyzing not only Wikipedia but also numerous texts. The WATSON question answering system extracts an answer from fulltexts of the Wikipedia but structured or semi-structured information uses constraint information to extract an answer. The relating patent “Providing answers to questions using multiple models to score candidate answers (US 20130007055 A1)” only teaches use of Wikipedia semi-structured information to generate candidate answers but does not teach detailed methods.
Problems of the Wikipedia-based question answering system can be divided into two kinds.
First, when Wikipedia contents are converted to a knowledge base, ambiguity and information losses are caused. Even though Wikipedia's semi-structured information (entity information, category information, info-box information, document structure) can be converted relatively easily to structured knowledge, ambiguity problems may be caused when natural language expressions are mapped to standardized classes, properties, instances of the knowledge base in the processes of standardizing Wikipedia fulltexts and converting them to structured knowledge, resulting in distortions and losses of information. The knowledge base-based question answering system standardized based on Wikipedia can, thus, only use a part of information from the Wikipedia.
Second, since names such as classes, properties, instances and the like are standardized and stored in the knowledge base standardized based on the Wikipedia contents, ambiguity of knowledge can be reduced. However, the natural language question and answering service using them can extract answers from the knowledge base when words in the natural language question should be converted precisely to class names, property names, and instance names of the knowledge base. The process for converting natural language expressions to standardized knowledge base expressions has another ambiguity which deteriorates performance of the question answering system.
Prior Art: KR Patent Publication No. 1020110026039