The present invention relates to information servers and particularly, but not exclusively, to Internet servers and methods of controlling an Internet server.
An important factor which has led to a rapid growth in people and businesses connecting to the Internet is the wealth of information it contains and makes available to practically anyone who has a telephone connection and a personal computer. This strength, however, leads to problems when an information or service provider, which uses the Internet as its communications medium, wishes to control how its information can be accessed.
The information accessible from the Internet is stored on servers which form part of the Internet infrastructure. The information is accessed by clients (which are controlled by users or customers) which are typically connected to, but which are not part of, the Internet. Normally, the clients only connect to the Internet for a relatively short time using, for example, a dial-up modem connection across a telephone line.
While communications and information transfer between Internet clients and servers relies on the well-established TCP/IP protocols, higher-level, dedicated protocols are employed to access certain types of information specific to one of the many services available on the Internet. Different services support different formats of information and allow different types of operation on the information. For example, a Gopher client allows retrieval and display of predominantly text-based information, an FTP (File Transfer Protocol) client supports the transfer between a server and a client of binary or ASCII files and a World Wide Web (or simply a Web) client can retrieve and display mixed text and graphical information, as well as sounds, movies (usually encoded via MPEG), virtual xe2x80x98worldsxe2x80x99, and any other data type for which an appropriate xe2x80x98viewerxe2x80x99 (xe2x80x98helperxe2x80x99) application or xe2x80x98plug-inxe2x80x99 is available.
The following description concentrates on the Internet Web service for the purpose of explanation only. The concepts described, however, are more broadly applicable to other Internet services and to other information services available from different communications networks.
FIG. 1 illustrates an example of an Internet connection serving a plurality of clients 100 connected via a local area network 110 to a workstation 120. The workstation 120 is connected via a router 130 and a modem (or ISDN interface) 140 to an Internet connection provider 150. A connection originates from a Web client, for example a Web browser, which is a software process typically residing on a personal computer (PC) or workstation. Using the connection, for example, client 100a can retrieve public information from any Internet server.
In the following description the term Internet server means a physical computing platform which is attached to the Internet, whereas the term Web server means a software process which resides and runs on the physical Internet server to provide the Internet server with Web server functionality. The term server on its own can mean either a Web or an Internet server depending on the context, although the distinction is rarely of significance for the purposes of the following description.
The Web employs a protocol called http (HyperText Transfer Protocol) to support access by a Web browser of information on a Web server. Of course, when transmitted across the Internet, the http information is wrapped in the TCP/IP protocol. The information retrieved by the Web browser is typically an HTML (HyperText Markup Language) file which is interpreted by the browser and displayed appropriately on a display screen as a Web page of information.
The Web browser specifies the information it wishes to retrieve using a URL (Universal Resource Locator) of the form:
http://lnternet server name/server directory/file name,
where xe2x80x9chttpxe2x80x9d indicates that the URL points to a Web page of information. The Internet server name is translated into a physical network location by the Internet. The server directory is the location on the server of the file and the file name is that of the file in the directory which contains or generates the required information.
FIG. 2 is a diagram which illustrates the general form of a typical graphical user interface display provided by a Web browser, for example the Netscape (TM) Navigator Web browser. The display includes several main areas: an options area 200 providing the user-options for controlling and configuring the browser, a Web page display area 210 for displaying a Web page, a location box 220 for displaying the location, or URL, of the displayed Web page, and a status box 230 which displays information concerning the status of Web page retrieval. Also illustrated on the screen is a pointer 240, the position of which can be controlled by a user using a computer mouse, roller-ball or equivalent pointing device. The user interacts with the browser by positioning the pointer appropriately on the screen and selecting available options or functions provided by the browser or displayed on the Web page by, for example, xe2x80x98clickingxe2x80x99 a mouse button.
An HTML file comprises ASCII text which includes embedded HTML tags. In general, the HTML tags are used to identify the nature and the structure of the Web page, and to identify HyperText links (hyperlinks), which are described in more detail below, and their associated URLs.
The display capabilities of a Web browser determine the appearance of the HTML file on the screen in dependence upon the HTML tags. HTML can in general identify:
the title of the file;
the hierarchical structure of the file with header levels and section names;
bulleted, numbered, and nested lists;
insertion points for graphics;
special emphasis for keywords or phrases;
pre-formatted areas of the file; and
hyperlinks and associated URLs.
In general, a hyperlink provides a pointer to another file or Internet resource. Sometimes also a hyperlink can point to a different location in a currently-displayed Web page. Within an HTML file, hyperlinks are identified by their syntax, for example:
 less than A HREF=xe2x80x9c{URL}xe2x80x9d greater than {anchor-text} less than /A greater than 
where the  less than  . . .  greater than  structure identifies the HTML tags.
The syntax typically includes a URL, which points to the other file, resource or location, and an anchor definition. In this case, the anchor is defined as a piece of text. In a Web page, typically a hyperlink is represented graphically on screen by the anchor. The anchor can be a piece of highlighted text or an image, for example a push-button or icon image. Where, for example, the anchor is non-textual, the underlying syntax usually also specifies a respective anchor image file location, which may be on the same or on a different server, as follows:
 less than A HREF=xe2x80x9c{URL}xe2x80x9d greater than  less than IMG SRC=xe2x80x9c{URL}xe2x80x9d greater than  less than /A greater than 
where IMG SRC specifies the location of the image file for the anchor.
The effect of a user selecting a hyperlink, by moving a pointer over the anchor and clicking, say, the mouse button, is normally that the Web browser attempts to retrieve for display as a new Web page the file indicated by the URL. However, sometimes a URL refers to a software process rather than to a Web page per se, as described in more detail below.
In some browsers, for example Netscape (TM) Navigator, when the pointer merely moves over a hyperlink anchor, the browser can be arranged to display the underlying URL in the status box of the display screen, irrespective of whether the user selects the hyperlink or not. Thus, a user can normally see the URL of any hyperlink in a Web page.
HTML files sometimes also include references to other files, for example, graphics files, which are retrieved by the browser and displayed as part of the Web page typically to enhance visual impact. Each reference comprises an appropriate HTML tag and a URL. In practice, the browser retrieves the requested Web page first and then retrieves other files referenced in this way by the Web page. Often, therefore, the textual portions of a Web page appear before the graphical portions.
A user is able to view the ASCII text source code of an HTML file using source code viewing facilities provided by some browsers. Thus, a user is able to view the URLs for any hyperlink or other imported file.
Generally, a user can retrieve a Web page using several methods which are supported by most browsers: by manually entering the URL into the location box, by selecting a Bookmark (the stored URL of a previously-accessed Web page), or by selecting a hyperlink in a displayed Web page. The first two methods potentially allow a user to access any Web page or other resource file at any time. The third method requires the user to first access a Web page which incorporates a hyperlink to the required Web page or image file before that Web page or image file can be retrieved.
In certain circumstances, it would be desirable to limit access by the third method only.
Since, however, a user can normally see any URL embedded in an HTML file and can access a Web page by entering the respective URL directly into a browser, under normal circumstances a service provider has little control over which Web pages are accessed and how they are accessed.
Many servers are arranged to address this problem by employing access tables which include table entries controlling which users can access which pages. An alternative measure, which is widely used, is to employ user identification and password protection to protect certain files on the server. Both measures are open to some degree to xe2x80x9cspoofingxe2x80x9d by unauthorised persons who have been known to masquerade as an authorised user by, for example, intercepting and cracking passwords for these protected files. A further disadvantage of both measures is the management overhead of keeping access tables or password files up-to-date, particularly where large numbers of users and/or pages are involved, or where the authorised user population changes regularly.
Also, even if Web page access is controlled using access tables or password protection, a service provider normally has no control over the order in which an authorised user can access the Web pages once the URLs are known.
In accordance with one aspect, the present invention provides an information server comprising:
means for receiving a request from a client for an item of information, said item of information including at least one reference to a further item of information;
means for modifying the item of information by replacing the or at least one reference by a token;
means for storing the or each token and each respective reference in storage means; and
means for returning to the client the modified item of information.
An advantage of this aspect of the invention is that the client is not provided with the actual reference information, such as a URL, for the further item(s) of information. Thus, the client would not know the name or location on the information server of the further item(s) of information.
A token preferably comprises a series of digits or other characters. Preferably, a token has a form from which no information about the reference or the respective information item can be derived. In the case of digits, the token may be, for example, generated by a random number generator each time an information item is requested. For a suitably long token number length, therefore, the chances of obtaining the same token more than once for a particular reference are relatively low.
In the following description, it is assumed that any requested item of information includes at least one reference to a further item of information.
In accordance with a second aspect, the present invention provides an information server comprising:
means for receiving a request from a client for an item of information, the request including a token indicative of the item of information required;
means for comparing the token with one or more stored tokens to find a matching stored token, each stored token being associated with a corresponding reference to an item of information; and
means for returning to the client, in dependence upon finding a matching stored token, a respective corresponding item of information.
Thus, the information server only returns items of information that can be requested validly by a client, on the basis of a previously-requested item of information. An advantage of this aspect is that the information server can control the order in which items of information can be requested and returned.
In preferred embodiments, where there are a plurality of tokens and respective references stored in association with the information server, and a request includes a valid token, the information server includes means to remove from the store the remaining tokens and respective references. Thus, once one from a selection of available tokens is requested, the remaining, unrequested tokens are removed and thereafter are thus not available for request.
The information server preferably includes means to store with each token and its respective reference the identity of a valid client. Also, in embodiments where multiple clients have access to the information server, the information server includes means to derive from a request for an item of information the identity of the client. Accordingly, a request for an item of information from a particular client is processed by the information server with respect only to tokens and their respective references having a valid identity.
In a particularly advantageous form, the information server has Web server functionality. Then, a reference may comprise or incorporate a URL. The URL may be part of a hyperlink or, alternatively, it may refer to a further resource, for example an image file intended for display as part of a Web page.
An embodiment of the present invention will now be described, by way of example only, with reference to the accompanying drawings of which: