The present invention relates generally to a method, system and computer program for searching in a computer network and more particularly, to searching a multiplicity of web sites or a directory accessed via the Internet or an intranet.
Yahoo™ search engine, Google™ search engine and other search engines are currently known to search the Internet. When performing searches over the Internet, or any other type of network such as a corporate Intranet, a user may need to search a number of repositories. The search engine requires the user to fill out a form with search terms and other criteria, or at least to type search terms into a search-specific entry field. A similar sequence of user interactions is required to initiate an Lightweight Directory Access Protocol (LDAP) directory search or a search via the Google™ search engine, i.e. firstly loading a web page and then filling in a form to generate an LDAP directory search request or a Google™ search request. The search engine takes the search criteria and formulates a search string. The search engine compares the search string to the search engine's database of keyword indices and any matches are returned to the user as a “hit list”.
To create a database of keyword indices, search engines use software robots or software spiders to crawl the Internet. Each software robot has its own strategy for crawling the Internet, but generally each software robot starts from a predetermined list of historical Uniform Resource Locator (URLs) and from this list locates a document. The software robot may either parse the entire document, the title of the document or the first paragraph. The parsed information is indexed and stored in a database of keyword indices.
In some cases, a search is returned by a search engine displaying a list of successful hits. In other cases, an unsuccessful hit may be returned and the user may be given the option of selecting another link to another web site in which a user may search further in order to delve deeper into the search engine's document repository. This step requires the user to re-enter his or her search query and perform a subsequent search.
An example of the above can be found by using the Google™ search engine. If an unsuccessful (for example a HTTP 404 error) or successful hit, (for example a list of web pages) is returned, a user is given the option of “searching within results”. Clicking on this link, a further search page is presented allowing a user to re-enter his or her search query and search within the returned web pages. Alternatively, a user can access another search engine and re-enter his or her search query into the search box to try and locate the results that he or she requires.
The above task when performed repeatedly over a number of search engines or within the same search engine can become repetitive and tedious, requiring repeated entry of a search query in a number of different formats. Often when performing an advanced search, a varying number of parameters are required to be entered, for example, case sensitive letters, keywords only, title only, body only, Boolean operations and specifying the maximum number of documents to be returned etc.
Current web browsers provide a mechanism in which the web browser will remember the last word that a user has typed into the search input box. As a user types in the first few letters of the word, the web browser will begin to match the letters with a word that was previously entered into the input box. For example if a user had previously typed in the word “toaster”, the web browser would begin to pattern match (pre-fill the input box) as the user types “t”, followed by “o” followed by “a”, the user is then able to select the work toaster if that is the word for which the user is looking.
WO 02/091241 discloses a system and a method for distributed real-time search mechanism in a network. Network nodes operating as consumer or requesting nodes generate search requests. Nodes operating as hubs are configured to route the search requests in the network. Communication between nodes in the network may use a query routing protocol. The common query protocol is implemented as a server side protocol and is used to enable business to business services. The user expresses a search query in an appropriate format for each web site, including the selection of various options for example the language of the search. Some systems require authentication to be performed, for example, subscription based repositories, for which normal search services cannot obtain access, and therefore can not return the requested search results.
An object of the present invention is to provide a system, method and program product to facilitate searching through a multiplicity of web sites or a directory accessed via the Internet or an intranet.