The present invention relates to a method of monitoring the usage of a website to provide an output indicative of a current distribution of clients across monitored entities of the site (for example, across all the pages of the site).
As used herein, the term xe2x80x9cwebsitexe2x80x9d is to be understood as including any collection of files each downloadable over a network for display at a client machine and between which a user at that client can move by following hyperlinks embedded in the files; in particular, whilst the origin of the term xe2x80x9cwebsitexe2x80x9d stems from the xe2x80x9cworldwide webxe2x80x9d application based on the HTML mark-up language and the HTTP protocol, the present invention is not restricted to such specific standards and the term xe2x80x9cwebsitexe2x80x9d as used herein is to be read more broadly. Furthermore, the present invention can be applied to any website regardless of whether it is an Intranet, extranet or intranet site.
FIG. 1 of the accompanying drawings shows a well known arrangement of a website 10 accessible over a network 14 by web clients 13 (typically a web browser running on a PC).
The website 10 is made up of a collection of pages (P1-P6 in the present example) that take the form of HTML files held on a web server 12. Upon a web client 13 requesting a file (identified by its URL) over the network 14 using the HTTP protocol, the server 12 retrieves the file and sends it to the requesting client for display.
The pages making up the website 10 are normally arranged in a hierarchy with the page P1 at the head of this hierarchy being termed the xe2x80x9chome pagexe2x80x9d of the site. In the present case, the home page P1 has links to three xe2x80x9csecond levelxe2x80x9d pages P2 to P4, and second-level page P2 has links to two third-level pages P5 and P6. It will be appreciated that this example is very limited in terms of the number of pages and links between pages.
Associated with each HTML page file, there will normally be image files (and increasingly also sound and video files) which will be automatically loaded into the client with the page file. Furthermore, a page may be divided up into a plurality of frames, as defined by a frame definition file, into each of which content files can be independently loaded.
It is well known to collect usage data for a website (such as the website 10 of FIG. 1) by noting each time each individual page of the site is requested (often called a xe2x80x9chitxe2x80x9d) during the course of a day. Such data may then be analysed to produce basic statistical data such as the number of overall hits on the website by day/month/multiple months, and the number of hits for each page by day/month/multiple months. Collecting addition data associated with each hit (file request) can provide further useful dataxe2x80x94for example, noting the origin of each file request permits the identification of the most productive xe2x80x9cportalxe2x80x9d providing a hyperlink to the website.
Another useful type of information that can be collected is the behaviour and preferences of users. The collection of this type of information requires each requesting client (or associated user) to be identifiable at least during the course of a session of interaction with the website. There are several ways of doing this, one of the most well known being the use of xe2x80x9ccookiesxe2x80x9d that at the request of the website are stored by clients and supplied back to the site with every file request; xe2x80x9ccookiesxe2x80x9d permit the usage of the site by individual clients to be tracked across multiple sessions of interaction. Another method of tracking website usage by individual clients, at least during a single session of interaction, is to attach a client identifier to every URL contained in pages served to each client, the identifier being allocated when the first page request is received during a session of interaction; with this arrangement, the identifier is automatically returned by the client with every file request (the identifier being stripped off the URL path information before the file is retrieved and then added onto every URL in that file as it is downloaded).
Tracking how particular users navigate a site is useful in determining which groups of topics are of common interest to particular groups of users; this is not only of interest for customer behaviour analysis on commercial sites but also permits a degree of predictive serving of files into caches to improve speed of service to the client. Where clients are uniquely identified across multiples sessions (whether by use of cookies, by use of a logon procedure involving user identification, or in some other manner), it is possible to carry out detailed behavioural analyses and to provide a measure of personalised services to the user. By way of example, International Application WO97/26729 describes an automated collaborative filtering application for use with world wide web advertising.
None of the above web-usage monitoring techniques provides a view of what is happening on a website at a particular point in time and it is an object of the present invention to provide such information.
According one aspect of the present invention, there is provided a method of monitoring the usage of a website having a plurality of monitored entities each constituted by a file downloadable to a web client or by a logical or sequential combination of such files, the method involving the steps of:
(a) associating an identifier with a web client visiting the website which identifier is provided to the site by the web client with each file request from that client;
(b) monitoring which files are requested by web clients visiting the site and storing currency information that indicates or permits a determination of, for each web client, which monitored entity or entities requested by that client are still current, at a particular point in time, for said client in terms of not having been superseded by a files or files subsequently requested by that web client;
(c) generating from said currency information an output indicating, for said particular point in time, a current distribution of web clients across said monitored entities by reference to which of said monitored entities are then current for said clients.
Although step (c) could be carried out off-line, it is likely to be much more use to effect step (c) on-line to produce a continually updated near real-time indication of the current distribution of clients across the monitored entities.
Because the HTTP protocol is a stateless protocol, it is possible for a web client to cease to be interested in a website without the latter being aware that the client has moved on; in this case, it would be incorrect to continue to consider that client as having a current monitored entity on the website. In order to minimise this possible source of error, the xe2x80x9ccurrentxe2x80x9d status of a monitored entity associated with a particular client is cancelled when the time elapsed since a request from that client has exceeded a predetermined cut-off value. In fact, the website may be provided with an indication that a particular client has ceased to be interested in the site (for example, through a log-off procedure or by ensuring that the site is involved whenever an off-site link is activated from one of its own pages); in such cases, this indication is used to ensure that there are no xe2x80x9ccurrentxe2x80x9d monitored entities associated with the client concerned.
In one embodiment, the monitored entities are individual files corresponding to respective pages of the website. In this case, the currency information can comprise, for each client, a client data item including an indication of the last preceding page file requested by that client; step (c) then involves determining whether the last preceding page file is a monitored entity.
In another embodiment, at least one monitored entity is defined in terms of a combination of a particular frame-definition file and a predetermined file serving as a source file for a frame defined by the frame-definition file. In this case, the currency information can comprise, for each client, a client data item including a list of the last preceding files requested by that client; step (c) then involves determining from the list whether said at least one monitored entity is current for that client which is taken to be so when both the particular frame definition file and the predetermined file are current.
In a further embodiment, at least one monitored entity is defined in terms of a sequential combination of first and second predetermined files in that order. In this case, the currency information can comprise, for each client, a client data item including a list of the last preceding files requested by that client; step (c) then involves determining from the list whether said at least one monitored entity is current for that client which is taken as being so when the first predetermined file has been superseded by the second file and the latter is current.
Preferably, the output generated in step (c) takes the form of a graphical display of the structure of the website including representations of the monitored entities visually indicating the relative magnitudes of the number of clients currently associated with each entity. Alternatively, the output generated in step (c) takes the form of a histogram indicating the number of users for each monitored entity.
According to another aspect of the present invention, there is provided a method of monitoring the usage of a website involving the steps of:
associating an identifier with a client visiting the website which identifier is provided to the site by the client with each page request from that client;
at each request by a client for a page of the website, at least where that page is different from a page currently being browsed by the client,:
generating and storing a current-presence indication indicating that the client, as represented by the client""s identifier, is currently browsing that page, and
removing any prior current-presence indication for that client indicating the client""s presence at a different page,
generating from said current-presence indications an output indicating the current distribution of clients across the pages of the website.