The progress of information technology (IT) and the development of computer systems has resulted in the creation of very large knowledge bases containing thousands of documents and electronic files. Computers collect, store, sort and quickly retrieve references to documents contained within such repositories. The exploitation of such repositories has commonly been based on the use of indexing techniques that allow keyword-based searches for the retrieval of the documents upon request. By formulating a query based on a given set of keywords, a search engine generally provides a ranked output list of documents.
IR techniques have been employed in a wide variety of situations where a user needs precise and quick access to reference documents. One example of such a situation is help-desk (e.g., information booths) or hotline services which are organised for providing quick and effective technical support to customers of computer and other products. Indeed, the daily work of help-desk analysts providing such services is often supported by sophisticated IT systems containing tens of thousands of problem solving documents for all aspects of the products concerned. Upon a problem being raised by a customer, a help-desk analyst examines the problem and provides a quick solution meeting the particular concern of the customer. To achieve this, the help-desk analyst often abstracts the problem into a few keywords. However, IR techniques based on key word searches usually return, by far, too many documents and only a few of the listed documents turn out to be of any real use to the help-desk analysts, a factor that what inevitably jeopardises the effectiveness of the services which are rendered to the customer. In most cases, first line agents of the help-desk services have very little time for finding an effective and practical solution to a single customer's problem, and the use of the traditional techniques based on keyword searches provides too much noise and returns too many documents to be easily exploited by the first level help-desk staff.
Moreover, the handling of the references and documents that are retrieved by the traditional keyword-based search techniques requires professional skill and wide experience from technicians to recognise, among the number of references being cited, a particular document which would be useful for solving the customer's problem. The need for such experience and professional skill is a further difficulty for developing help-desk services which are notorious for being subject to large staff turn-over.
Therefore, it can be seen that in the area of help-desk services, there is a particular need for improving searching techniques to enhance the relevance of the documents and references retrieved from a document repository.
However, whilst the techniques to be described below are particularly suited to this area, they nevertheless address the general problem to improve access to the document collection stored in a database. Therefore application of the techniques in other areas is not excluded.
Finally, because there is an increasing number of situations where document repositories need to be continuously updated, it is highly desirable that the repository update that is performed by introducing additional documents be automatically processed without the need of any manual intervention nor human inspection of the documents.