The rapid growth of the World Wide Web and information available through the web led to a problem of information retrieval. As a result, many search engines have come up to retrieve information according to the needs of the users on the web, Google and Yahoo being examples such search engines. Most of the search engines provide a facility to search information based on keywords.
Most often users are concerned about their area of expertise or their domain of work when they search for information. Users of search engines would like to search for information relating to their work. While they tend to use keywords that are common in their domain, it is possible that the same keywords are used with a different meaning in a different context or domain. For example, a hiring consultant might want to search for all resumes (curriculum vitae) relevant to a profile and would want to include the keyword “resume” in his search query. But “resume” could also mean “to start something”. Also there could be many pages on the web that would be seeking “resumes” from potential candidates. Even though in such pages the keywords are used in the context of the same domain, user who is searching for resumes would not be interested in browsing through such pages.
Most search engines provide advanced options to make the search query more refined and specific to the needs of the user. To tap the full potential of the search engines the user should be aware of the complete search language and the various options provided by the search engines. A typical user does not have the time and patience to read and understand the various options available and most often tends to use the basic keyword search. Besides, the options provided are in different format across different search engines and sometimes options are not common across search engines.
Most search engines search indexed documents from the web. Due to large volume of documents on the web, it is impossible for any one search engine to index all available documents on the web. Hence, different search engines may have a different set of indexed documents. While all search engines might have documents that are relevant for a user, it is possible that different search engines show different results for the same query. Besides, search engines have their own search and ranking strategy. This would make it difficult for a user to choose the right search engine for their needs.
Various metasearch engines have emerged which act as a bridge between the users and other search engines. They save the effort of accessing multiple search engines and trying to figure out results in each one separately. If the user were to search individually in each of search engines, the user would be encountering a lot of repetitive results as well. The metasearch engines help solve such problems. However, still the user has to go through a number of documents to figure out which ones are important and which are not.
U.S. Pat. No. 6,363,376 discloses a method of querying multiple career websites to post job seeker information to multiple career websites from a single interface. However, the invention disclosed does not address the problem of searching generic search engines to retrieve domain specific information. Furthermore, the query used to search career websites is being provided by the user. There is no intelligence or a filter mechanism to ensure that the job postings being applied for are indeed matching the candidate in all respects.
Therefore it would be advantageous to have a tool that can act as a metasearch engine and at the same time capture intelligence relating to specific domains to be able to perform domain specific searches without the user actually knowing the advanced search techniques available in various search engines.
Attempts have been made to provide domain specific results based on user queries through a number of ways.
U.S. Pat. No. 6,920,448 discloses a method of performing domain-specific metasearch and obtaining search results, where a metasearch engine is provided to fetch results from various search engines and displayed to the user. The obtained search results are further processed by a data mining module to form clusters of related documents and display to user. The patent further discloses categorization mechanisms where clustered documents are categorized according to pre-defined set of categories. Furthermore, the patent discloses a mechanism to manually specify or automatically generate stop word lists that can be stripped off from search documents before processing by the data mining tool. The method disclosed employs a search result refining process rather than a query optimization process. As a result, it is possible that a number of irrelevant results might be retrieved and unnecessary processing might be performed at the client side.
U.S. Pat. No. 6,513,031 discloses a method refining a query to find results in a specific search area, where the search area is identified using a combination of query analysis including number of query terms and natural language parsing, and using user profile information. Furthermore, the search area is also identified by comparing search query to text found in pages on a network. The invention disclosed involves a lot of interaction with the user and requires user input in multiple stages to identify a search area based on previously visited websites, possible topics, user profile and scope clues. The invention in U.S. Pat. No. 6,513,031 is more intended to an average user to help him identify his area of interest. For a professional user who wants to retrieve domain specific results from various sources, he would not want to spend time in providing inputs every time to decide on an interest area.
U.S. Pat. No. 6,411,950 discloses a method and apparatus for dynamic search result refinement by refining a vague text-search query into a more specific text-search query. The method includes generating a phrase list to refine a query to more specific query. Phrase list is constructed using queries from multiple users. Also, the phrase list may be constructed by picking a list that is only statistically significant; the significance being determined by the number of times a particular phrase appears in the received user queries. The method as disclosed in the invention requires infrastructure to collect and analyze queries from various users and index them to arrive at a refinement strategy to be presented to the user for selection. User still has to select from a list of possible options for the query terms provided to refine the results.
The present invention provides a method and a computer program product to assist users in performing domain specific searches across multiple search engines, without the user actually knowing any advanced search techniques of any of the search engines being searched in. The present invention provides a mechanism where domain intelligence is hidden from the user and the user does not have to provide inputs to refine query every time he inputs a query. The embedded intelligence ensures that the query returns relevant results and no other processing is required to be performed by the user. Furthermore, the search intelligence can be shared by users with other users.