In conventional information retrieval systems, most users follow a well-known pattern: There is an initial query, either expressed in natural language, keywords, database query, etc. This query is used to search a database or other knowledge base repositories for a wide range of results.
Different approaches have been taken with respect to information retrieval and search techniques within large database repositories. Typically, the process of information retrieval is triggered by a query entered by a user. In many scenarios it is easier and more convenient for human users to provide and type (or speak) a sentence in natural language than to use a formal syntax like a formal query language referring to concepts in an ontology when searching for content or asking a question. Learning a formal syntax takes effort and practice and if that initial hurdle can be removed then the content of a knowledge base becomes accessible a lot easier. Queries, in this context, formally capture the user's information needs, and are aimed at retrieving a set of results that match the query, ordered by relevancy. In most cases, the user input is a string of natural language text, enabling the execution of keyword queries of a database to retrieve a listing of items from the database that are indexed with the same keywords in the repository.
Two important information retrieval systems performance measures are “precision” and “recall”. Given a particular query, a set of items in the repository, and an a priori knowledge of document relevancy so that each item is known to be either relevant or non-relevant for a given query, “precision” measures the ratio between the number of relevant items included in the set of query results and the total number of the set of results, while “recall” measures the ratio between the number of relevant items in the set of results and the total number of relevant items in the repository.
Generally, there is a trade-off between recall and precision, so that if precision is increased, recall will be poor, and in turn, if recall is increased, precision will be poor. For keyword-based systems, many systems do not reach 40% for both measures, given that ambiguous words in the query of a queried database might produce erroneous results and that different ways of referring to the items in the database might cause relevant documents not to appear in the results.
Ontology-powered approaches and semantic technologies have enabled more precise results, for they enable a better “understanding” of the user needs. The filtering and selection of results is particularly relevant in systems with a high volume of information in which users retrieve too many results, making the relevant documents not easily accessible.