Companies that rely on internet and/or intranet web servers as a key point of contact for their employees, customers and/or potential customers (collectively the xe2x80x9cSystem-Usersxe2x80x9d) need timely and accurate information about the interests and needs of their System-Users. Achieving this goal is a difficult challenge for companies with high volume multi-web server usage because the volume of web server usage data generated can be enormous. Thus, for these companies, a web usage analysis system must be able to process and store large amounts of web server usage data in an efficient manner or it will quickly become overwhelmed.
Programs exist today to analyze web server usage. The starting point for all these programs is the web server log. When the System-User of a web server makes a request for certain content, one or more entries are written in a log record, which records the System-User""s request. The log entries for one request can be extensive because they contain the data for all pictures and graphical representations included in the content request. After the log entries are generated, they are stored sequentially in a log on the web server. The existing analysis programs perform their usage analysis function by retrieving these log records and preparing various reports based upon the usage as reflected in the log records. These programs may, or may not, store the data in databases prior to generating usage reports.
The available programs, however, have many limitations which make them of limited utility to companies with high volume multi-web server usage. The programs do not process the usage data fast enough to provide daily reporting because of the inefficient way in which these systems process web server usage data. The reasons for such inefficiencies include the absence of a process for: (1) filtering the data from the server; (2) reducing database size by creating summary data and deleting the details, and (3) minimizing the data that is stored through the use of reference tables.
A further limitation of these statistical programs is that they cannot generate reports over extended time periods. This precludes management from viewing usage information over various time periods such as weeks, months and quarters. In addition, management would also like to perform timely queries of usage data based on various date parameters. Existing packages provided limited capability to do this because of the manner in which these programs store and maintain usage data.
A further limitation with these programs is that many companies would like to identify the use of the web server so that they can match this user information with other information that is maintained relating to the user. The existing programs contain no method of identifying the user.
Finally, the existing systems all require that a company devote significant human resources to schedule and operate the process of recording, collecting and analyzing web server usage from multiple web servers.
It is thus an object of the present invention to provide an efficient system for collecting, filtering, analyzing, and reporting web server usage data from one or more web servers.
It is a further object of the invention to provide an automated daily or periodic process by which web server usage data can be collected from multiple servers from one or more physical locations (whose identity can change on a daily or periodic basis) and loaded into databases that can be used to produce various reports.
It is a further object of the invention to minimize the data handled by the automated daily process by filtering unneeded usage data at the web server.
It is a further object of the invention to provide efficient querying by compacting the details in the database tables through the use of reference tables and minimizing the size of the database by generating summary data so that the details can be deleted periodically.
It is a further object of the invention to provide daily, weekly, monthly, and quarterly reports.
It is a further object of the invention to provide ad-hoc reports which permit users to specify their own date parameters.
It is a further object of the invention to identify the System-Users by decrypting the System User""s cookie which allows the system to link the user with other information maintained on the system about the user to produce useful reports.
It is a further object of the invention to provide an automated process that requires minimal human intervention (other than setting system parameters) to perform the collection, analysis and reporting of web server usage.
The above and other objects of the present invention are realized through a system which uses software installed at each web server to filter the server usage data by removing unneeded file types such as pictures, typically reducing the size of web server usage files by a significant amount, such as, for example, 75 percent or more in some cases. The system allows the cookie decryption algorithm to be installed as part of the web server filtering program so that the system can identify the System-User.
On a daily or scheduled basis, the collection process collects the filtered usage records from all the servers on the system, further processes the usage data, and then transfers the data to the Analysis Server where the data is loaded into a relational database. The collection processes can be set to run automatically based on the time of day set for the process to run.
To support efficient querying and minimize the use of system resources, the Main Table in the database is kept compact by: (1) creating summary data so that the details in the Main Table can be deleted automatically on a periodic basis and (2) using reference tables so that only unique identifier values are stored in the Main Table. The system uses the reference tables to link the unique identifier value in the Main Table with the actual information stored in the reference tables. Summary Tables are created from the Main Table to support weekly, monthly and quarterly reporting of web site usage. The identity of the System-User is linked with other information maintained about the user to support various reports related to the System-User.
Other than setting system parameters and loading certain non-usage company and System-User information, the invention may be run without human intervention except for periodic maintenance.
The foregoing features and advantages of the instant invention may be more fully appreciated by reference to a specific embodiment thereof, as described hereinbelow in conjunction with the following figures of which: