1. Field of the Invention
The present invention relates to an apparatus for question answering based on answer trustworthiness and a method thereof. More particularly, the present invention relates to an apparatus for question answering used for ranking an answer by considering the trustworthiness of answer candidates in various aspects such as a quality, a source, an answer extracting strategy, etc., and a method thereof.
2. Description of the Related Art
The World Wide Web provides a large collection of interconnected information sources (formed by various formats including text, image, and media contents) relating to almost all imaginable subjects. With the growth of the web, a user's ability capable of searching the collection and identifying contents related to a predetermined subject becomes gradually important and a plurality of search service providers are currently provided in order to satisfy the necessity.
In general, the search service provider posts a web page and a user can transmit a question indicating what the user is interested in through the web page. The search service provider generally prepares the list of links to the web pages or sites considered to be related to the question in the form of a “search result” page and transmits it to the user.
Question answering generally involves the following steps.
First, previously prepared indexes or database of the web pages or sites are searched by using one or more search word extracted from the question and the list of hits (generally a target page or site identified to include the search word or be related to the question in another scheme, or reference for the target page or site) is prepared. Subsequently, the ranking for the hits is determined according to a predetermined criterion and the best result (according to the criterion) is disposed at the most conspicuous portion, for example, the top of the list.
The list of the hits of which the ranking is determined is generally transmitted to the user in the form of a “result” page (alternately, a series of interconnected pages) including the list of links of hit pages or sites. Other features such as a sponsor link or advertisement may also be included in the result page.
The determination of the ranking of the hits often serves as an important factor indicating whether the user's search is successful or unsuccessful. Occasionally, the question returns so many hits that it is impossible for the user to examine all the hits within a valid time. In the case where first several links which the user follows are not guided to relating contents, the relating contents may be provided in a much lower part of the list, but the user often abandons the search and the search service provider.
Therefore, in order to maximize a possibility in that the relating contents will be disposed at a conspicuous portion, the search service provider has developed more complicated page ranking determination criteria and methods. The methods include methods for extracting an answer for a question on the basis of the trustworthiness of a web page or web text.
In one method for extracting the answer on the basis of the trustworthiness, user determination information for various pages or various sites is integrated to be reflected onto a text search system. Herein, the user determination information may include a determination from the user who asks a question and determinations from other users selected by the user who asks the question from members of his ‘trust network’. In addition, the user may configure the trust network from social network data indicating the relationships between other users and the user. Since the above-mentioned algorithm, as a method for measuring the trustworthiness of the text by using text relatedness evaluation results of the users included in the trust network and improving the search performance on the basis of the measured trustworthiness, uses users' manual operation results at the time of evaluating the text trustworthiness, it is very high in accuracy. However, the algorithm has a disadvantage in that it takes a lot of effort and too much time.
In addition, another method for extracting the answer on the basis of the trustworthiness proposes an access method in which a similarity based appropriateness ranking is combined for quality ranking under a centralized and distributed text search environment of the text search. Six quality evaluation features used herein include currency, availability, information-to-noise ratio, authority, popularity, and cohesiveness. In the centralized search, a search effect using currency, availability, information-to-noise ratio, and page cohesiveness metrics is remarkably improved and in selection the site, the availability, the information-to-noise ratio, the popularity, and the cohesiveness metrics have an important role in improving the performance. In information fusion, the popularity metric has an important role. Accordingly, using the quality evaluation features help improving the performance in both the centralized and distributed text searches. Since the algorithm is the method for increasing the search performance by just measuring the trustworthiness of the text source, the algorithm is limitative in improving the search performance.
In addition, yet another method for extracting the answer on the basis of the trustworthiness proposes a framework for using non-textural features such as click count in order to measure a textual quality. This method measures the trustworthiness of the answer by using thirteen non-textual features such as click count, answer adaptation rate, the length of the answer, etc. in order to improve the performance of a community-based question answering service and helps an experimental result and improvement of the performance.
However, in the above-mentioned known methods, evaluation of the textural trustworthiness by the manual operation is just used, only the text trustworthiness is automatically calculated and used, or only the non-textual features such as the click count, the answer adaptation rate, etc. are used. That is, since the known methods are based on the relatedness between a keyword of a question inputted by the user and a text keyword, the trustworthiness of the answer itself is not considered.