The present application relates to compiling and reporting data associated with activity on a network server and more particularly to a method and apparatus for configuring web site traffic analysis programs by, for instance, enhancing categorization of web pages into traffic reporting groups.
Programs for analyzing traffic on a network server, such as a worldwide web server, are known in the art. One such prior art program is described in U.S. patent application Ser. No. 09/240,208, filed Jan. 29, 1999, for a Method and Apparatus for Evaluating Visitors to a Web Server, which is incorporated herein by reference for all purposes. NetIQ Corporation owns this application and also owns the present application. In these prior art systems, the program typically runs on the web server that is being monitored. Data is compiled, and reports are generated on demand—or are delivered from time to time via email—to display information about web server activity, such as the most popular page by number of visits, peak hours of website activity, most popular entry page, etc.
Analyzing activity on a worldwide web server from a different location on a global computer network (“Internet”) is also known in the art. In a conventional implementation, a provider of remote web-site activity analysis (“service provider”) generates JavaScript code that is distributed to each subscriber to the service. The subscriber copies the code into each web-site page that is to be monitored. When a visitor to the subscriber's web site loads one of the web-site pages into his or her computer, the JavaScript code collects information, including time of day, visitor domain, page visited, etc. The code then calls a server operated by the service provider-also located on the Internet-and transmits the collected information thereto as a URL parameter value. Information is also transmitted in a known manner via a cookie. Each subscriber has a password to access a page on the service provider's server. This page includes a set of tables that summarize, in real time, activity on the customer's web site.
The basic mechanism of such services is that each tracked web-site page contains some JavaScript in it that requests a 1×1 image from the service provider's server. Other information is sent along with that request, including a cookie that uniquely identifies the visitor. Upon receipt of the request, applicants' service records the hit and stages it for full accounting. This is a proven method for tracking web site usage.
The above-described arrangement for monitoring web server activity by operating a program on the web server itself, or by a service provider over the Internet, is generally known in the art. Examples of the information analyzed includes technical data, such as most popular pages, referring URLs, total number of visitors, browser application used, IP addresses of visitors, time and dated web pages visited, returning visitors, etc.
Many, if not most, companies and organizations maintain sites on the worldwide web for informational and commerce purposes. Each site is comprised of multiple pages with varying functions and content.
The operator of each site is interested in knowing how the people who visit it are using the site. Who is coming, where did they come from, what are they looking at, how long did they stay—all are questions that the operator might ask. This curiosity begot the class of tools known as web server log analyzers described above.
Conventional versions of these analyzers report only on the raw data, giving the number of times each particular URL was downloaded. As web sites become more sophisticated, so does the need for more sophisticated analysis. In particular, it becomes important to interpret the meaning of the downloaded pages and report on it, rather than merely the name of the page.
For instance, there might be a hundred pages all with different URLs, and all of the pages pertain to the customer service function. It may be desirable to report the traffic patterns to the Customer Service area of the site, rather than to each of the constituent pages. Given the information that visitors are spending more time in Customer Service than in the Catalog area of the site could help an organization redesign their site.
Other kinds of page classifications are possible: which pages constitute the “shopping cart” of a site, which pages should be filtered out as “noise” in the analysis, which pages indicate a particular advertising campaign that brought visitors to the site, etc.
Known systems for implementing data traffic analysis for hosted web pages, particularly that sold by assignee of the present invention under their Log Analyzer software product, allows a user to configure the program to recognize particular pages or groups of pages as having “special meaning.” One example of this would be to categorize a page as “representing a view of a shoe product” or “this set of pages are in our Tech Support area.” The method typically used by conventional web data analysis systems for categorizing pages is based on textual pattern recognition of the URL. For example, “all URLs containing the substring ‘service’ should be grouped into the Customer Service category.” As users' needs grow in complexity, so does the means of recognizing URLs.
This complexity has naturally led to the use of regular expressions, which are, in effect, tiny algorithms for pattern matching. The way the user configures these product and content group patterns in applicant's Log Analyzer product is to type in a string to match against every Uniform Resource Locator (URL) that is seen in the log file. For example, one might say, “any URL that matches the expression ‘/catalog/shoes/*.htm’ is a shoe product.” In another example, a regular expression for the above condition for URLs in the Customer Service category would be “.*service.*\.html.” A URL is only one type of request tracked and others can be contemplated such as requests for video files, PDF files, applications such as Flash-based presentations, etc.
Creating the regular expression by pattern matching requires a lot of knowledge about which specific URLs are contained in the website. The method of matching URLs against a set of defined patterns yields the desired analytic results. However, the accuracy of those results depends on the correctness of the configured patterns. Though this process might be reasonable for the IT engineer or web master, it is cryptic and most likely beyond the capabilities of the manager or marketing person. Unfortunately, since the patterns can be cryptic and complicated, correct configuration can be difficult. The necessity for correct configuration, and its inherent complexity, suggests that a simpler method for specifying and verifying configurations would be welcome.
Accordingly, the need still remains for a way to more easily configure web site traffic tracking programs that overcome the complications of methods taught in the prior art.