A number of systems and methods have been proposed or created to control access to remote resources publicly available to users of other computers through the collection of networks known as the Internet. The collection of all such publicly available resources, linked together using files written in Hypertext Mark-up Language ("HTML"), and including other types of files as well, is known as the World Wide Web ("WWW"). Some systems that have been proposed or created to control access to remote resources include a database of information about WWW resources and a program that allows or blocks access by a user to resources based on information stored in the database. A program that controls access to remote resources will be termed a filter. A filter may be embodied in software running on the client machine or on a separate machine.
In one embodiment, the database or permissions derived from the database are stored local to filter, either on the same machine or within a common firewall. Such a system is described in U.S. patent application Ser. No. 08/469,342, "System and Method for Database Access Control", filed on Jun. 6, 1995. In another embodiment, the filter sends messages to a remote server that serves messages containing information from the database, and the filter uses that information to determine whether to permit access to a particular resource. In either case, the filtering code may be part of a browser, part of a proxy server, or a separate program that determines whether to allow or deny resources. These scenarios are under discussion by groups formed by the WWW Consortium and the Internet Engineering Task Force.
Resources are requested in the Hypertext Transport Protocol ("http") by means of a name such as a Uniform Resource Identifier ("URI") referring to a resource at the destination machine, or a Uniform Resource Locator ("URL") which contains both a URI and the domain name or IP address of the remote site which the URI is stored. A database of information about resources may refer to resources by means of such names and addresses or by expressions.
However, URLs are not unique identifiers for resources. Distinct URLs can name the same resource in the sense that clients requesting these URLs will receive identical resources in response, and repeated requests for a single URL may result in the client's receiving different resources at different times. The following situations describe some ways in which such naming ambiguities occur.
To start, it is necessary to describe how distinct resources can name the same resource. This can happen in several ways. First, it can happen because different domain names are mapped by a Domain Name Server to the same physical server. Second, it can happen because a server knows that different path names at its site are aliases for the same resource. Third, it can happen when identical copies of the resource are stored, or mirrored, at distinct sites with different URLs. Finally, it can happen indirectly as follows. When a protocol such as http is initiated, the information transmitted in the protocol can include protocol status information, resource information, and/or a resource, as well as other fields. Information about a resource can include specific data such as the content type or last modification date but also can include data such as a different URL for the resource. Status information can include a response code indicating that a request for a resource should be redirected to another URL. Thus, the following scenarios are possible. First, when the client requests the resource named by a URL, the remote server may return a redirection code and a new URL, and the client may then request the new URL separately. A second possibility is for the remote server to return a resource along with a new URL; in this case, there is no guarantee that the URL is a correct name for the resource, in the sense that a separate request for that URL is not guaranteed to produce a response with the identical resource. A third possibility is that when a client requests a URL from a remote server, the remote server sends a request for a different URL to another server and forwards the response back to the client. Redirections are commonly used because the resource moved, because it was requested by a method such as an image map where the requested URL includes keywords that encode information that the remote server uses to compute a URL to return, because it was requested via a Common Gateway Interface command which executes on the remote machine to determine what resource to return, or because the server uses directions to facilitate collection of data on request behavior of individual users. The resource returned may also be computed on the fly from the information in the request.
Furthermore, requests for the same URL may result in distinct responses at different times, either because the resource itself has changed or because the remote server chooses to send back different resources or different redirection URLs at different times. When a request is made for a resource, the response may include a modification date, but in general the modification date is not guaranteed to be updated when changes are made to the file. For a file, the value of the file is often described by a checksum. A checksum or message digest is a number that is calculated from the resource such that identical resources are guaranteed to have the same checksum, and distinct resources are unlikely to have the same checksum. A number of such procedures exist in the literature. An example is the Message Digest 5 ("MD5") checksum procedure, which also has the feature that given a number, it is difficult to create a resource with that number as its checksum. This particular procedure is well known in the art, and discussed in Applied Cryptography: protocols, algorithms, and source code in C, by Bruce Schneier, Wiley Publishing, 1994, ISBN 0-471-59756-2. For practical purposes, it is ordinarily assumed that files are identical if and only if the checksums are identical.
In the above situations, the server of the resources may have knowledge of the relationships between URLs, but the client and user of the client do not have a prior knowledge of the relationships. For a given request, the client may see multiple URLs through redirections, but will not generally see all possible URLs for the same resource. The client may or may not show the user the new URL and the user of a client may not be aware of the existence of multiple URLs for the same resource. Thus, a filter that functions as a rater by rating resources using software for storing ratings in a database based on URLs may cause a rating to be stored for one or several or these URLs but not for all URLs naming the same resource, and the filter may not know of the existence of these other URLs.
A proposal has been made to assign a unique permanent name called a Uniform Resource Name, or URN, to each resource. In this case, servers would translate a URN into a URL that would specify a specific copy of this resource. Distinct requests could result in the same URN being translated into distinct URLs, depending, for example, on the physical location of the client. However, this approach would not eliminate all the sources of ambiguities described above.
The above naming problems can also occur in other situations, such as systems using databases of keywords, annotations for resources, quality ratings, or categorizations of resources.