1. Field of the Invention
The present invention relates to network search engines, more particularly, to a method for collecting, compiling, and updating a data in a database for use by a search engine on a computer network.
2. Description of the Related Art
Compiling information from a computer network, particularly the Internet, is a difficult task because of the size and the characteristics of the network. The amount of data grows very rapidly and because the network sites are continuously changing, the compiled information can be considered volatile. In addition to this, there are also a number of different information-bearing file types including, for example, image files, document files, audio files, and video files, that makes discovering information a difficult task.
The current, most widely used generic solution for compiling information from a network is known as crawling and involves simply visiting a set of known sites, analyzing and indexing the content of those sites, and checking the sites for other site references. Any new site references are added to the set of known sites. Crawling is considered to be a “brute force” method.
Crawling has a number of drawbacks. First, if the crawler has only a single processing node, there is only a single point of entry from where the network is examined, so the examination cannot be expanded efficiently. Second, expanding such a system to increase the number of entry points necessitates adding processing nodes, rendering it more error- and attack-prone. The increase in the number of processing nodes also necessitates some kind of information exchange among the processors in order to avoid multiple processors visiting the same site simultaneously.
Third, crawling is not capable of discovering secluded islands, that is, sites or groups of sites that are not referenced externally, rendering them invisible to the crawler.
The result of such a crawler is merely an excerpt of the data available on the network that can be used to create search and recommendation systems that harness the compiled data to provide value-added services.