An important factor, which has led to a rapid growth in people and businesses connecting to the Internet, is the wealth of information it contains and makes available to practically anyone who has a telephone connection and a personal computer. This strength, however, leads to problems when an information or service provider, which uses the Internet as its communications medium, wishes to control the information being accessed.
The information accessible from the Internet is stored on servers, which form part of the Internet infrastructure. The information is accessed by clients, which are controlled by users or customers and are typically connected to, but not part of, the Internet. Normally, the clients connect only to the Internet for a relatively short time using, for example, a dial-up modem connection across a telephone line or an Ethernet adaptor connected to an Ethernet cable.
While communications and information transfer between Internet clients and servers relies on the well-established TCP/IP protocols, higher-level, dedicated protocols are employed to access certain types of information specific to one of the many services available on the Internet. Different services support different formats of information and allow different types of operation on the information.
For example, a Gopher client allows retrieval and display of predominantly text-based information, an FTP (File Transfer Protocol) client supports the transfer between a server and a client of binary or ASCII files, and a World Wide Web (or simply a Web) client can retrieve and display mixed text and graphical information, as well as sounds, movies (usually encoded via MPEG), virtual ‘worlds’, and any other data type for which an appropriate ‘viewer’ (‘helper’) application or ‘plug-in’ is available.
The Web employs the HyperText Transfer Protocol or HTTP to support access by a Web browser of information on a Web server. Of course, when transmitted across the Internet, the HTTP information is wrapped in the TCP/IP protocol. The information retrieved by the Web browser is typically an HTML (HyperText Markup Language) file, which is interpreted by the browser and displayed appropriately on a display screen as a Web page of information.
The Web browser specifies the information it wishes to retrieve using a URL (Universal Resource Locator) of the form:
(http://Internet server name/server directory/file name)
Typically, “http” indicates that the URL points to a Web page of information. The Internet translates the Internet server name into a physical network location. The server directory is the location on the server of the file and the file name is that of the file in the directory, which contains or generates the required information.
FIG. 1 is a diagram illustrating the general form of a typical graphical user interface display 100 provided by a Web browser, for example the Netscape™ Navigator Web browser or Microsoft Explorer™ Web browser. The display 100 includes several main areas: an options area 104 providing the user-options for controlling and configuring the browser, a Web page display area 108 for displaying a Web page, a location box 112 for displaying the location, or URL, of the displayed Web page, and a status box 116 which displays information concerning the status of Web page retrieval.
Also illustrated on the screen is a pointer 120, the position of which can be tactily controlled by a user using a computer mouse, roller-ball or equivalent pointing device. The user interacts with the browser by positioning the pointer appropriately on the screen and selecting available options or functions provided by the browser or displayed on the Web page by, for example, ‘clicking’ a mouse button.
An HTML file comprises ASCII text, which includes embedded HTML tags. In general, the HTML tags are used to identify the nature and the structure of the Web page, and to identify HyperText links (hyperlinks), and their associated URLs.
Display capabilities of a Web browser typically determine the appearance of the HTML file on the screen in dependence upon the HTML tags. In general, a hyperlink provides a pointer to another file or Internet resource. Sometimes, a hyperlink can also point to a different location in a currently-displayed Web page. Within an HTML file, hyperlinks are identified by their syntax, for example:
<A HREF=“(URL)”>(anchor-text)</A >
Typically, the < . . . > structure identifies the HTML tags. The syntax typically includes a URL, which points to the other file, resource or location, and an anchor definition. In this case, the anchor is defined as a piece of text. In a Web page, typically a hyperlink is represented graphically on screen by the anchor. The anchor can be a piece of highlighted text or an image, for example a push-button or icon image. Where, for example, the anchor is non-textual, the underlying syntax usually also specifies a respective anchor image file location, which may be on the same or on a different server, as follows:
<A HREF=“(URL)”> <IMG SRC=“(URL)”> </A>
Where IMG SRC specifies the location of the image file for the anchor. The effect of a user selecting a hyperlink, by moving a pointer over the anchor and clicking, say, the mouse button, is normally that the Web browser attempts to retrieve a new Web page corresponding to the indicated URL.
However, sometimes a URL refers to a software process rather than to a Web page per se. In some browsers, for example Netscape Navigator™, when the pointer merely moves over a hyperlink anchor, the browser can be arranged to display the underlying URL in the status box of the display screen, irrespective of whether the user selects the hyperlink or not. Thus, a user can normally see the URL of any hyperlink in a Web page.
HTML files sometimes also include references to other files, for example, graphics files, which are retrieved by the browser and displayed as part of the Web page typically to enhance visual impact. Each reference comprises an appropriate HTML tag and a URL. In practice, the browser retrieves the requested Web page first and then retrieves other files referenced in this way by the Web page. Often, therefore, the textual portions of a Web page appear before the graphical portions.
A user is able to view the ASCII text source code of an HTML file using source code viewing facilities provided by some browsers. Thus, a user is able to view the URLs for any hyperlink or other imported file.
Generally, a user can retrieve a Web page using several methods which are supported by most browsers: by manually entering the URL into the location box 112, by selecting a Bookmark (the stored URL of a previously-accessed Web page), or by selecting a hyperlink in a displayed Web page 108. The first two methods potentially allow a user to access any Web page or other resource file at any time.
The third method requires the user to first access a Web page that incorporate a hyperlink to the required Web page or image file before that Web page or image file can be retrieved. In certain circumstances, it would be desirable to limit access by the third method only.
Since, however, a user can normally see any URL embedded in an HTML file and can access a Web page by entering the respective URL directly into a browser, under normal circumstances a service provider has little control over which Web pages are accessed and how they are accessed.
Many servers are arranged to address this problem by employing access tables, which include table entries controlling which users can access which pages.
An alternative measure, which is widely used, is to employ user identification and password protection to protect certain files on the server. Both measures are open to some degree to “spoofing” by unauthorized persons who have been known to masquerade as an authorized user by, for example, intercepting and cracking passwords for these protected files. A further disadvantage of both measures is the management overhead of keeping access tables or password files up-to-date, particularly where large numbers of users and/or pages are involved, or where the authorized user population changes regularly. Also, even if Web page access is controlled using access tables or password protection, a service provider normally has no control over the order in which an authorized user can access the Web pages once the URLs are known.
There have been some attempts to address these issues. For example, in PCT Patent Application No. 98/32066 to McGee, the contents of which are herein incorporated by this reference in their entirety, an Internet server employs a session manager that intercepts all incoming requests from clients for Web pages. Each request incorporates a token that is compared by the session manager with tokens, which are stored in a session database. Tokens in the database have a corresponding real URL. When a token in the database is found that matches the received token, the real URL is used to retrieve the Web page(s) associated with the URL. One drawback to using tokens, each of which has a corresponding real URL, is that the token may be stolen or used by an unauthorized party to access contents of the Web page associated with the token. The contents of the Web page are potentially compromised in the event that the token is compromised. This means that Web pages with sensitive data may be subject to unauthorized access without the Web server ever knowing that unauthorized access has occurred.