Different approaches have been taken with respect to information retrieval and search techniques within large database repositories. Typically, the process of information retrieval is triggered by a query entered by a user. Queries, in this context, formally capture the user's information needs, and are aimed at retrieving a set of results that match the query, ordered by relevancy. In most cases, the user input is a string of natural language text, enabling the execution of keyword queries of a database to retrieve a listing of items from the database that are indexed with the same keywords in the repository.
Two important information retrieval systems performance measures are “precision” and “recall”. Given a particular query, a set of items in the repository, and an a priori knowledge of document relevancy so that each item is known to be either relevant or non-relevant for a given query, “precision” measures the ratio between the number of relevant items included in the set of query results and the total number of the set of results, while “recall” measures the ratio between the number of relevant items in the set of results and the total number of relevant items in the repository.
Generally, there is a trade-off between recall and precision, so that if precision is increased, recall will be poor, and in turn, if recall is increased, precision will be poor. For keyword-based systems, many systems do not reach 40% for both measures, given that ambiguous words in the query of a queried database might produce erroneous results and that different ways of referring to the items in the database might cause relevant documents not to appear in the results.