Companies that rely on internet and/or intranet web servers as a key point of contact for their employees, customers and/or potential customers (collectively the “System-Users”) need timely and accurate information about the interests and needs of their System-Users. Achieving this goal is a difficult challenge for companies with high volume multi-web server usage because the volume of web server usage data generated can be enormous. Thus, for these companies, a web usage analysis system must be able to process and store large amounts of web server usage data in an efficient manner or it will quickly become overwhelmed.
Programs exist today to analyze web server usage. The starting point for all these programs is the web server log. When the System-User of a web server makes a request for certain content, one or more entries are written in a log record, which records the System-User's request. The log entries for one request can be extensive because they contain the data for all pictures and graphical representations included in the content request. After the log entries are generated, they are stored sequentially in a log on the web server. The existing analysis programs perform their usage analysis function by retrieving these log records and preparing various reports based upon the usage as reflected in the log records. These programs may, or may not, store the data in databases prior to generating usage reports.
The available programs, however, have many limitations which make them of limited utility to companies with high volume multi-web server usage. The programs do not process the usage data fast enough to provide daily reporting because of the inefficient way in which these systems process web server usage data. The reasons for such inefficiencies include the absence of a process for: (1) filtering the data from the server; (2) reducing database size by creating summary data and deleting the details, and (3) minimizing the data that is stored through the use of reference tables.
A further limitation of these statistical programs is that they cannot generate reports over extended time periods. This precludes management from viewing usage information over various time periods such as weeks, months and quarters. In addition, management would also like to perform timely queries of usage data based on various date parameters. Existing packages provided limited capability to do this because of the manner in which these programs store and maintain usage data.
A further limitation with these programs is that many companies would like to identify the user of the web server so that they can match this user information with other information that is maintained relating to the user. The existing programs contain no method of identifying the user.
Finally, the existing systems all require that a company devote significant human resources to schedule and operate the process of recording, collecting and analyzing web server usage from multiple web servers.