The present invention relates generally to the search and retrieval of desired information on a data network, and more particularly relates to the search and retrieval of desired information on the Internet.
The rapid growth of the Internet has created a global computer network that contains enormous amounts of information. The distribution of information throughout the Internet is supported by services such as FTP, News, Gopher, and the World Wide Web (or the Web as it is referred to hereafter). The Web is the most well known distribution service and the amount of information available on Web pages expands and changes daily.
Because of the enormous amount of information on the Internet, it is becoming increasingly difficult to effectively and efficiently search and retrieve information that is of particular interest. One common way to search and retrieve information from the Internet is by utilizing a commercial search engine, such as Excite or Alta Vista, to generate a list of Internet sites containing the desired information, and then retrieving the desired information from the identified Internet sites. While this method may work for its intended purpose, the user must interact with the search engine and relevant Internet sites at the time the search and retrieval is performed. In addition, if the user is connected to the Internet with a relatively slow connection, retrieving search results may consume the resources of the user""s computer for substantial periods of time, thereby preventing the computer from being used for any other tasks during the retrieval process.
In order to assist a user to search the Internet in a more effective and efficient manner, intelligent searching applications have emerged in which Internet searching is more automated. Generally speaking, intelligent searching applications, sometimes referred to as intelligent search agents, bots, or simply agents, include programs that gather information on some regular schedule without the immediate presence of the user required. Using parameters provided by the user, an intelligent searching application searches all or some part of the Internet, gathers the information of interest, and presents it to the user on a daily or other periodic basis. For example, a user may create a profile that is utilized by an intelligent searching application to locate information on the Internet that fits the profile. The intelligent searching application may also manage the search in the background, so that the user is free to perform other tasks. An exemplary intelligent search application according to the prior art is a shareware program called Web Bandit. More information about Web Bandit may be fount at HYPERLINK, http://www.jwsg.com, the contents of which are hereby incorporated by reference as of the filing date of the present disclosure. More information about intelligent searching applications in general may be found at HYPERLINK, http://www.botspot.com, the contents of which are hereby incorporated by reference as of the filing date of the present disclosure. Although intelligent searching applications may simplify Internet searching for the user, they may impact the performance of the user""s computer. For example, while the intelligent searching application is operating in the background, the performance of other applications on the user""s computer may decline. In addition, the user must still download searched information in order to have access to the information at a desktop computer.
Although there may be performance degradation during a searching operation, there is more significant performance degradation during the retrieval, or downloading of information from the Internet. During the retrieval of information from the Internet, many of the resources of a desktop computer (e.g., processor, input/output, system bus, and hard disk) are being utilized. When large amounts of information, such as video files, are being downloaded from the Internet, the resources of a desktop computer may be completely consumed.
Although an intelligent searching application may consume resources of the computer on which the search is initiated, in most network environments, such as in a typical corporate enterprise network, the internal operation within one desktop computer does not effect the operation of other desktop computers. Moreover, it is common in a corporate enterprise network that the network resources are not at substantial utilization levels at all times. For example, desktop computers and enterprise servers are not in constant use throughout a workday and typically experience lower levels of use during non-work hours (e.g., nights and weekends). Because the computer resources of the network are not at substantial utilization levels at all times, a substantial amount of computing capacity goes unused.
In view of the enormous amount of information that is available on the Internet, the emergence of intelligent searching applications, and the resource distribution and consumption patterns within typical enterprise networks, it would be desirable to provide a method for searching the Internet and retrieving the results of the search that increases the speed and efficiency with which searches and retrievals are performed while also distributing the consumption of computer resources.
In a preferred embodiment, a method for searching the Internet is provided that includes generating search criteria for an Internet search utilizing a first search agent that is resident on a first computer, distributing search tasks related to the Internet search to other search agents that are resident on other computers, utilizing the other search agents to perform the distributed search tasks, and then reporting the results of each search task back to the first search agent. In a preferred embodiment, the other search agents also retrieve the of their distributed search tasks, so that the search results may be more easily accessed by the person that initiated the search. In an embodiment, the computers that host the search agents have connections to a common intranet and the search tasks are distributed only to search agents that have been identified as being available to support Internet searching. The distribution of search tasks may include the use of intelligent algorithms that optimize the quality of the search while minimizing the impact on the search agent computers and the intranet to which they are connected.
In a preferred embodiment, Internet search agents (ISAs) are applications, residing on computers such as desktop computers, that are available to manage Internet searches. The ISAs generate search criteria that may include, for example, the type of information that should be searched for (e.g., as identified by keywords), where the search should be conducted (e.g., specific Web sites), the timing of the search (e.g., immediate, nighttime, weekends etc.), and/or the frequency of the search (e.g., hourly, daily, weekly, etc.).
After the search criteria is generated, an initiating ISA identifies whether or not there are any other ISAs present on the network. The initiating ISA also determines the extent to which the identified ISAs are available to support the search that is to be performed according to the search criteria.
Once other ISAs have been identified, the initiating ISA negotiates with the available ISAs to determine the extent to which each ISA may support the search that is to be conducted. The initiating ISA may negotiate with multiple supporting ISAs in series or parallel depending on the particular configuration and need of the initiating ISA.
Before, during, or after the negotiation between the initiating ISA and the supporting ISAs, the initiating ISA breaks the present search down into multiple search tasks that may be performed separately by the supporting ISAs. Once supporting ISA availability has been negotiated and search tasks have been identified, the search tasks are distributed to the supporting ISAs. The distribution of search tasks among the ISAs may involve varying levels of complexity. In a preferred embodiment that incorporates a simple distribution approach, search tasks may be distributed on a pure availability basis. That is, the initiating ISA distributes as much of the search as possible to the first available ISA and any remaining portions of the search are distributed to the next available ISAs, until the entire search is distributed.
Once the search tasks have been distributed, the search tasks are performed by the supporting ISAs. If the initiating ISA has retained any search tasks, the retained search tasks are also performed. In accordance with a preferred embodiment, the ISAs perform their search tasks independently of each other utilizing their own computer resources. In addition, the ISAs may perform their search tasks simultaneously with each other or at different times from each other. Search results generated by the supporting ISAs are reported back to the initiating ISA.
In accordance with a preferred embodiment, information identified by the search results is retrieved by the supporting ISAs, so that the search information can be more quickly and easily accessed by the user. That is, search results, including the actual Web pages, video files, graphics files etc., are downloaded from the Internet and stored by the supporting ISAs so that the information can be accessed by the person that initiated the search without having to connect to the source Internet server. In an embodiment, supporting ISAs may be selected at least partially based on the speed with which the supporting ISAs can download information from the Internet. The downloaded information may be pre-scanned for viruses and/or security risks before it is accessed by the user.
In another embodiment, distributed Internet searching may be coordinated through a master ISA that resides on a central server. The master ISA may distribute Internet searches to supporting-ISAs throughout an organization. The master ISA implementation works well in a distributed network architecture in which many users access the master ISA via a virtual private network (VPN). A master ISA may also be integrated with a corporate HTTP proxy so that Web sites fitting search criteria and/or commonly searched Web sites may be stored locally.
An advantage of distributed searching is that search tasks may be distributed in a manner that minimizes the impact on an intranet and its associated computers. Further, users that are remotely connected to an intranet may access the resources of the intranet by distributing search tasks to ISAs that may have, for example, more computing power and/or greater bandwidth availability. An advantage of implementing distributed searching via ISAs is that the ISAs function automatically in the background, requiring little or no action by the user after the initial setup.