The Internet is a global system of computers that are linked together to facilitate communication between computers. These computers can be accessed by users so as to download and display informational pages therefrom. The easy access and inexpensive cost of retrieving Internet pages has led to several problems for controlling access to inappropriate information, such as pornography. Several solutions to this problem have been proposed, including rating systems similar to that used for rating movies so that a parent or employer can control access to Internet servers, or pages, that have a particular rating. In addition to a rating scheme, others have developed databases that contain the uniform resource locator (URL) address of sites to be blocked. These databases are integrated into network computer systems and Internet firewalls so that a person wishing access to the Internet first has their URL request matched against the database of blocked sites and is denied access to access any URL found in the database. One such system is described in U.S. Pat. No. 5,678,041.
Such systems rely on the completeness of the database of accessed sites, and, since new servers and URLs are being added to the Internet on a daily basis, these databases do not provide a complete list of sites that should be blocked. An improvement to the system described in U.S. Pat. No. 5,678,041 is presented in EP1318468, which describes a system distributed between a central “database factory” arranged to perform URL categorization and to store the results of the categorization in a central database, and many local “access systems”, each of which is associated with a given LAN and can connect to the database factory. Any given access system is loaded with a copy of, and updates to, the categorization data from the database factory, and additionally includes a so-called filter module, which can perform a certain amount of processing in relation to uncategorized URLs. These uncategorized URLs are any URLs that are requested by a client on the LAN and that are not listed in the categorised data downloaded from the database factory.
When a local access system receives a URL request from a client machine on its LAN, it can often identify a category for that URL on the basis of the categorisation data received from the database factory. However, if a category cannot be obtained from the database factory, the local filter module will perform some local processing such as text processing in order to identify whether or not it is safe for the client to access the URL. The output of the local processing will be stored in association with the uncategorized URL. For any given accessed URL the filter module also stores a counter, which is incremented whenever a given URL is requested. At certain times—e.g. at a set time, at a random time, every n units of time, or when a given URL has been requested a specified number of times—the database factory requests the uncategorized URL and any associated processed data from an access system. Since the database factory collects data from different access systems and downloads its output to each of the access systems, any given access system can benefit from knowledge obtained via URL requests received from access systems connected to unrelated LANs.
One feature that is common to all known internet access control systems is the trigger for the URL analysis, which is the receipt of a request for access to a given URL; thus analysis of URLs is triggered by a direct request for access to a computer on the Internet corresponding to the URL.