1. Field of the Invention
The present invention relates generally to information retrieval, and more particularly to a system and method for adjusting search results based on the relative expertise between a searcher and the creator/s and/or contributor/s of a document.
2. Description of Related Art
With the proliferation of corporate networks and the Internet, an ever increasing amount of information is being made available in electronic form. Such information includes documents, graphics, video, audio, or the like. While corporate information is typically well indexed and stored on corporate databases within a corporate network, information on the Internet is generally highly disorganized.
Searchers looking for information typically make use of an information retrieval system. In corporate networks, such an information retrieval system typically consists of document management software, such as Applicant's QUANTUM™ suite, or iManage Inc's INFORITE™ or WORKSITE™ products. Information retrieval from the internet, however, is typically undertaken using a search engine, such as YAHOO™ or GOOGLE™.
Generally speaking, these information retrieval systems extract keywords from each document in a network. Such keywords typically contain no semantic or syntactic information. For each document, each keyword is then indexed into a searchable data structure with a link back to the document itself. To search the network, a user supplies the information retrieval system with a query containing one or more search terms, which may be separated by Boolean operators, such as “AND” or “OR.” These search terms can be further expanded through the use of a Thesaurus. In response to the query, which might have been expanded, the information retrieval system attempts to locate information, such as documents, that match the searcher supplied (or expanded) keywords. In doing so, the information retrieval system searches through its databases to locate documents that contain at least one keyword that matches one of the search terms in the query (or its expanded version). The information retrieval system then presents the searcher with a list of document records for the documents located. The list is typically sorted based on document ranking, where each document is ranked according to the number of keyword to search term matches in that document relative to those for the other located documents. An example of a search engine that uses such a technique, where document relevancy is based solely on the content of the document, is INTELISEEK™. However, most documents retrieved in response to such a query have been found to be irrelevant.
In an attempt to improve precision, a number of advanced information retrieval techniques have been developed. These techniques include syntactic processing, natural language processing, semantic processing, or the like. Details of such techniques can be found in U.S. Pat. Nos. 5,933,822; 6,182,068; 6,311,194; and 6,199,067, all of which are incorporated herein by reference.
However, even these advanced information retrieval techniques have not been able to reach the level of precision required by today's corporations. In fact, a recent survey found that forty four percent of users say that they are frustrated with search engine results. See Internet Usage High, Satisfaction low: Web Navigation Frustrate Many Consumers, Berrier Associates—sponsored by Realnames Corporation (April 2000).
In addition, other advanced techniques have also proven to lack adequate precision. For example, GOOGLE™ and WISENUT™ rank document relevancy as a function of a network of links pointing to the document, while methods based on Salton's work (such as ORACLE™ text) rank document relevancy as a function of the number of relevant documents within the repository.
This lack of precision is at least partially caused by current information retrieval systems not taking the personal profiles of the document creator, searcher, and any contributors into account. In other words, when trying to assess the relevancy of documents within a network, most information retrieval systems ignore the searcher that performs the query, i.e., most information retrieval systems adopt a one-fit-all approach. For example, when a neurologist and a high school student both perform a search for “brain AND scan,” an identical list of located documents is presented to both the neurologist and the high school student. However, the neurologist is interested in high level documents containing detailed descriptions of brain scanning techniques, while the student is only interested in basic information on brain scans for a school project. As can be seen, a document query that does not take the searcher into account can retrieve irrelevant and imprecise results.
Moreover, not only should the profession of a searcher affect a search result, but also the expertise of the searcher within the search domain. For example, a medical doctor that is a recognized world expert would certainly assign different relevancy scores to the returned documents than say an intern. This means that information retrieval systems should be highly dynamic and consider the current expertise level of the searcher and/or creator/s at the time of the query.
In addition, the current lack of precision is at least partially caused by the treatment of documents as static entities. Current information retrieval techniques typically do not take into account the dynamic nature of documents. For example, after creation, documents may be commented on, printed, viewed, copied, etc. To this end, document relevancy should consider the activity around a document.
Therefore, a need exists in the art for a system and method for retrieving information that can yield a significant improvement in precision over that attainable through conventional information retrieval systems. Moreover, such a system and method should preferably personalize information retrieval based on user expertise.