1. Technical Field
The present invention relates to a method and apparatus for processing queries and is suitable particularly, but not exclusively, for inputting queries to, and receiving information from, a database.
2. Related Art
The gathering and disseminating of information now forms a vital part of many business processes. These activities usually fall into three parts:
Collect, filter and sort raw information; understand the information, make recommendations based thereon; explain and communicate these recommendations to people. The issues involved in the first stage, that of information gathering, have received a great deal of attention, and continue to be the subject of considerable research around the world. In comparison, the third state, where information is communicated to others, has received relatively little attention. As the transfer of information is often a continuation of significant efforts in the early stages of information dissemination, there is a significant impetus to build on these efforts so that any preliminary work is not wasted. It is vital that the results of such a third state are presented in clear and accessible formats so that recipients of information can derive maximum benefit from the information.
Traditionally, information has been presented through reports. Recent technological developments such as the Internet and intranets have made it much easier to distribute such sources of information, but this benefit of accessibility incurs problems associated with the increased volume of information. Thus there is now such a huge amount of material available that it is difficult to know what is relevant and accurate. Search engines provide a means of retrieving documents that contain particular keywords, or a predetermined combination of keywords, but search results do not include any real measure of how a retrieved document content relates to the keywords. This is mainly a result of the way that documents, which may be books, articles, WWW pages, videos and presentation slides etc. are conceived. These documents often address specific issues or questions, and are typically written for a specific audience. Thus the context of the documents may be vastly different from that of interest to an initiator of a search, despite the fact that there is an overlap of keywords.
There are several systems available that attempt to manage information available from these data sources, and software agents in particular are known to manage information in various predetermined ways. Each agent generally comprises functionality to perform a task or tasks on behalf of an entity (human or machine-based) in an autonomous manner, together with local data, or means to access data, to support the task or tasks. For instance, an information agent might select documents of relevance to a topic or user. A general comprehensive review of agent-based technology is given by Hyacinth S. Nwana, “Software Agents: An Overview” in the Knowledge Engineering Review journal, Vol. 11, No. 3, pages 205–244.
In the Applicant's co-pending international Patent Application Number WO96/23265, there is described a software agent particularly for use in information management. The agent, known as “JASPER”, is associated with a user's Internet browser and alerts the user to documents of interest to them. To do that, JASPER uses a keyword set for the user concerned. However, by using clustering techniques, JASPER can extend the keyword set to pick up documents that would not have been located otherwise.
There are also tools known for processing the information itself, such as the PROSUM information summariser described in the applicant's co-pending European patent application number 97302616.4. This summarises information in accordance with a user's particular interest rather than simply in accordance with the content of the document. Hence a user looking at the results of a search and reading the summary produced by PROSUM will be alerted to a document in which the user's interest is represented by only a reference within the document, the document being principally about something else. Such documents tend not to be picked up by more conventional search tools.
Applicant's co-pending Patent Application Number WO99/21108 teaches a system that retrieves objects, such as documents, based on the keyword extraction disclosed in JASPER, and these objects are automatically stored in a database and entered against a user's, or a project's profile. The relationship between any documents so retrieved may be estimated based on criteria such as keyword occurrence, and this estimate is displayed graphically to the user. Access to these documents is a function of information supplied in the respective profiles such that, for example, groups of personnel may automatically be informed of information according to their project group, grade or a security rating as specified in the profiles.
Search engines that are used to search for documents include Yahoo, Alta Vista and Ask Jeeves, among others. The first two, Yahoo, and Alta Vista, search a list of keywords accompanying their index of documents for keywords that match the keywords input by the user. The search is purely a function of keyword ‘hits’, although at least some of these engines process the query to stem it to its root form. The third system, Ask Jeeves, allows users to phrase queries in natural language rather than entering keywords, and the subsequent search proceeds based on keyword occurrence in the same way as described with respect to the Yahoo engine. Another type of search system is provided by Whatis, which retrieves a text entry in response to a search query; in this system the search is performed on single keywords, and the system displays dictionary entries that correspond to the keywords. The first three of these search facilities discussed above, Yahoo, Alta Vista and Ask Jeeves, provide a user with exactly the problems disclosed above: the user does not know how relevant the document is to his query. The fourth search facility, Whatis, provides a link to single data entries, thus functioning as an electronic paper dictionary, and its use is extremely limited.
U.S. Pat. No. 5,404,295 describes a storage and retrieval system for retrieving selected passages in documents, database entries and the like. These selected passages (subdivisions) are linked to one or more annotations by pointers, and the annotations are stored in a database for querying. Incoming queries are examined against the annotations, in order to identify one or more annotations relevant thereto. When one or more annotations have been identified, the subdivisions relating thereto are retrieved and presented to the user. The presentation and chaining together of subdivisions is implicit in the way in which the annotations and pointers thereto are constructed.