1. Technical Field
The present invention relates to search engines and in particular to document retrieval with search engines on a data processing system. Still more particularly, the present invention relates to a method and system for implementing precise document retrieval via a search engine that provides contextual and weighted keyword searches.
The present invention relates to search engines and in particular to user interaction with search engines on a data processing system. Still more particularly, the present invention relates to a method and system for dynamic, real-time upgrading of the knowledge database of search engines by tracking user interactions.
The present invention relates to electronic databases and in particular to maintenance of content on electronic databases. Still more particularly, the present invention relates to a method and system for automated maintenance of an electronic database via a point system.
2. Description of the Related Art
Computer-based (electronic) search engines are well known in the art. Typically a search engine comprises a software application that is executed on a data processing system and includes a user interface and a database (or repository) of information that is stored on a memory component associated to or accessible by the data processing system. The information (data, documents, articles, etc.) within the database is made accessible to a user via the user interface and input/output (I/O) devices of the data processing system. Utilizing the I/O devices (e.g., keyboard, mouse, and/or touch pad), the user enters a search query in the user interface and submits the query to the search engine/application for processing. The search engines parses the query (if a phrase) into search keys and use those keys to access a set of keywords associated with various articles which can be retrieved.
When the query is received, the search engine compares the search terms (i.e., keywords or tags identified from the query) with the keywords within articles in the database and determines whether the entered search terms match (or are found within) any of the documents within the database. When a match is found, a list of the documents that matches the search terms is returned/outputted to the user. The returned list is typically presented without any structure.
Typically, a user's query is in the form of a text string, which may comprise of a combination of letters and/or numbers. Other characters may also be possible in more complex search engines. The items searched and returned from the database are articles (or electronic documents) or links associated with specific articles (or electronic documents). Some search engines are designed to provide exact matches of the text string in the query or near matches of the text string that are found in these articles. Traditionally, search engines returned any article containing a keyword (or phrase) that is found anywhere (i.e., at any level/section) within the document. However, many advanced search engines return articles which match at a specific level of the document (e.g., title or abstract, etc.). For example, a search for “computer” may provide a return of thousands of articles having either the exact term, computer, or documents with variations of that term, such as computing and compute, or documents with known synonyms such as processors, data processing systems, etc. More modern search engines typically narrow the result of a search query to include only those terms that are at or above a certain level of the article so that the returned articles at least have some relation to the search term provided. Despite this narrowing of searches based on the level at which the search is conducted, however, search engines are unable to always return items of relevance to the specific context in which the user is interested. Also, a specific term may not be tagged with an associated keyword to yield a hit within the database being searched by the search engine.
There are several other limitations with present search engines and the methods by which databases being accessed by these search engines provide data to the requester. For example, current search engines, particularly those with access to a large amount of information such as the Internet, often return a long list of search results containing a lot of noise documents. “Junk results” often greatly out-number the results that a user is interested in, and the results of interest are occasionally embedded deep within the list of provided results. From the user's perspective, the results of such searches often are too large to investigate, while at the same time the results do not contain enough relevant documents to be useful. Most users are only willing to look at the first few tens of results. Further, no assistance is given by the search engine to help the user understand how the documents relate to each other, or to the task that the user is trying to accomplish. Thus, the results of such searches are often unusable because hundreds of articles are returned but the type of information presented in each article is unspecified.
Because of the nature of results received from the publicly accessible (e.g., Internet) search engines, which provides general, non-specific, and un-organized information about many different topics, commercial databases and associated search engines have been designed for utilization within a specific subject area or industry. Searches conducted by these search engines within these commercial databases often provide an arrangement of information designed to train individuals working in that particular industry. As with public accessible databases, however, the number of documents in the commercially available databases continually increases by many orders of magnitude, and the number of irrelevant hits to each search has increased. As the size of the database of archived documents grows, search tools are needed that provide initially more accurate result sets (i.e., return of only a number of truly relevant documents to the user). The notion of “relevant” should only include the very best documents since there may be tens of thousands of slightly relevant documents.
Another limitation with existing search engines involves the inability of the search engine to improve its search features for a particular topic/search term that is searched for by multiple users on a recurring basis. This limitation is very visible with Internet-based search engines. Typically, Internet-based search engines provide a user with a list of potentially relevant articles. The user is left to work back and forth between the articles using browser navigation capabilities and book-marking useful articles. This procedure is often tedious and inefficient. When the user finishes the browser session and closes the window, the results of the user's search and interactions with the search results (bookmarks, etc.) are discarded. Any comments or thoughts the user might have had about the article are lost unless recorded in some other application or medium. Thus, when a next (or the same) user later completes the same search, that next user is forced to go through the tedious tasks of working back and forth through the list of articles and book-marking useful articles. Thus, if a first user's query of a term receives 200 hits only 5 of which are relevant in the particular context desired by the user, a next user with the same query in the same context will also receive the same 200 hits and have to manually work through the 200 hits to find the 5 relevant documents even when the 5 documents were previously identified by the first user.
Notably, depending on the search engine, the same search made at another time may not provide the same result set and may actually exclude articles of relevance to the topic being researched that are not identified by the search term. Although some search engines provide a user with a way to enter textual feedback, this feedback must be manually processed by system administrative personnel. Also, there is no way to maintain and update a database of information utilized by the search engine without substantial manual effort by system administrative personnel. Thus, there is currently no way for a user to provide context information that will enable the search engine to provide more accurate/relevant results. There is also no way for the search engine to utilize the results of a prior search and the efforts of one user to narrow the results to include truly relevant data so that the effort required by a later user requesting the same information is substantially reduced.
The present invention recognizes the limitations of present search engines in providing accurate/relevant results to searches being requested by users, and the invention provides a method and system for improving the relevance of results obtained by a search engine by learning utilizing a combination of search terms along with user-entered context information and user-selected keywords. The invention further provides a method and system for utilizing previous user searches to provide more accurate results to later user-entered queries. These and other benefits are provided by the invention described herein.
The present invention recognizes the limitations of present search engines in providing more efficient and complete solutions to user-entered search inquiries, and the invention provides a method and system for constantly improving a search engine's result producing facility by learning as each user refines his/her search and creates a solution within a user-specified context. The invention further provides a method and system for tracking user interactions with the search engine that are still being refined by a specific user and saving completed solutions to the database for access by other users. Also, the invention recognizes that it would be beneficial to automatically purge the database of information that is rarely utilized or obsolete based on a history of user interactions with the search engine. These and other benefits are provided by the invention described herein.
The present invention recognizes the limitations of manually maintaining an extensive database of information as is currently done, and the invention provides an automated method for maintaining databases utilizing a point system. The invention further provides a method and system by which users of a database are encouraged to add valuable information to the database as the database is being utilized by the users. These and other benefits are provided by the invention described herein.