There is an increasingly large amount of information stored electronically. In addition, there is an increasing tendency for the data of different databases to be made available to casual searchers. For example, the Internet, which comprises a large number of different servers storing diverse information, is continually expanding both because more and more data is being written to Internet servers and because more and more organisations are connecting their databases to the Internet and thus making the information contained on these available via the Internet.
In order to enable users to sift through this mass of information and find a relevant document amongst the vast sea of irrelevant documents, much effort is being expended amongst the Information Technology community to research and develop searching methods and tools to tame the resulting, so-called “data-overload information-poverty” problem.
Most search tools used for finding electronic documents on the Internet are keyword based searches and these will tend to return an unmanageably large number of hits for any reasonably general query comprising one or merely a few keywords (at least where these are reasonably common words). Even more sophisticated search tools employing refined categorisations of document contents, etc., will tend to return a large number of “hits” for any fairly general query because they tend fundamentally to be keyword based search tools, nonetheless.
One of the reasons for this problem is simply the vast amount of data which a computer is able to process very quickly in order to produce a very large list of hits. Compare the performance of a typical Internet search engine which is likely to produce tens of thousands of results for a simple query such as “Hercules” with a human librarian who would at most typically produce two or three “hits”. However, prior to carrying out the “search” a human librarian would probably check whether the reader meant the legendary Greek Hero, or something entirely different (e.g. a commercial organisation with that name, a computer program, etc.) and the human librarian would almost certainly find something of relevance to the reader.
It is therefore clear that a human librarian can often outperform an Internet Search engine because he or she is able to ask intelligent questions of the person requesting the search and thus to exclude large sections of the overall “library” of documents which might be classified as “hits” on the basis of a simple key-word search alone. Such an ability is beyond the capacity of current computers to emulate with any great success and therefore alternative technical solutions are required to enable computers to improve on their searching capability, or more precisely, on their ability to assist a user/requester in finding one or two documents which are especially relevant or of interest to the user/requester from amongst a large number of possible documents typically found using a simple keyword based search, taking advantage of the technical strengths of computers whilst seeking to overcome their respective weaknesses (in particular their lack of intelligence).
U.S. Pat. No. 6,526,440 describes a system whereby the results of a search are re-ranked according to the frequency with which the returned documents are cited by other documents. In other words, this document describes a method of re-ranking documents based on meta-information (i.e. information which is about the documents) rather than simply relying on the information contained within the documents.
Vivisimo has produced a search engine called “Clusty” and currently available at http://clusty.com in which the results of any particular search are clustered together into related categories.
WO 01/46870 filed by Amazon.com describes a system for placing the results of a search into corresponding categories (each result have been pre-assigned to a particular category—e.g. book, CD, etc.) and for determining the order in which to present the different categories to the user in accordance with various rules (e.g. by calculating a ratio of number of results from a particular category to number of items in that category and ranking the categories according to the value of this ratio in respect of each category).
U.S. Pat. No. 6,385,602 describes a system similar to the Clusty search engine described above in which after carrying out a search, the results are clustered and based on the clustering dynamic categories are defined and used for presenting the results to the user.
US 2003/0088553 describes a system in which a first database stores a predefined set of categories, a second database stores a set of “anticipated search terms” and mappings to one or more of the predefined categories, and a third database stores mappings between the categories and various internet web-sites (i.e. the web-sites are pre-categorised into one or more of the pre-defined categories). A search then proceeds by assigning an input search query to a category and then retrieving all of the web-sites (or links or titles thereto) pre-categorised into the respective category(ies) corresponding to the input search query. Note that this activity represents the entirety of the search process, thus at no stage is a keyword style search carried out, nor are the results of such a search then categorised into a plurality of separate categories, rather the result of the search is simply the sum of web-sites categorised as belonging to whichever category(ies) the input search query is matched.
EP 1 217 542 describes a system in which a mobile communications device (e.g. a mobile telephone) includes a personalised ontology which is used to help a user to identify favourite services by storing these (or links to them) in corresponding nodes of the personalised ontology. The description is somewhat unclear about exactly how a search is carried out, but it appears (especially from FIG. 6) that it proceeds by firstly looking for results to the search request from the personalised ontology and if this fails then a general search engine is used to look for appropriate results to the search (see items 616, 618 and 620). There is no discussion of how results of a search are displayed to the user or whether the results are categorised according to the personalised ontology before displaying them to the user.