Traffic from user terminals to an e-commerce, e-banking or online shopping website, for example, is affected not only by seasonal and other long-term fluctuations, but also by business-related events such as promotions. That is, traffic to a website fluctuates as a result of changes in user behavior from normal conditions due to an event of some sort. For example, a company may experience a sharp increase in traffic to their website when a promotion is held in connection with services provided by the website, after which traffic may gradually subside.
Someone using a website to do business such as the above company can operate the website efficiently in line with the amount of traffic received, by forecasting medium-term fluctuations in traffic caused by business-related events such as promotions, as well as forecasting seasonal and other long-term fluctuations in traffic. Techniques for forecasting fluctuations in traffic are thus extremely useful.
Generally, demand forecasting is performed by invoking regularity derived from past fluctuations in traffic and the regularity of changes in regularity in forecasting the future. In forecasting fluctuations resulting from events, the cases of past events need to be analyzed and the characteristics of fluctuations in traffic extracted. For example, in a prior invention by the present applicant (see JP 2006-268529A), the temporal fluctuation patterns of traffic caused by past events and past event information are saved in a database called an event characteristics model, and utilized in forecasting. Generally, temporal fluctuations in traffic to a website are obtained from the access log of the website.
Traffic to a website is represented, for example, by the page view count, the session count, or the session start count. The page view count is the number of views from user terminals to the main pages constituting a website. A session is a sequence of consecutive accesses from the same user terminal (same host). That is, a series of accesses from the same user terminal is treated as one session. Note that in the case where the interval between accesses from the same user terminal is greater than a fixed time period (e.g., 30 min), subsequent accesses are viewed as a different session from accesses prior to the fixed period.
As for the access log for a website, a log of communication using HTTP (HyperText Transfer Protocol) is commonly saved at present. Generally, the HTTP log records information relating to one access per line. In this case, a session ID identifying the session of the access is sometimes assigned to each line. The session count is the number of unique sessions within a prescribed period. The session count can, for example, be acquired by counting the number of unique session IDs within the prescribed period in the HTTP log. In contrast, the session start count is the number of sessions newly started within a prescribed period.
The kind of event that occurred is not saved in the actual access log of a website at this time. With the foregoing prior invention, event information concerning the website is thus separately input from an external source. That is, while the access log of a website can easily be accumulated using the web server function, there is no means for recording information about the events that cause characteristic fluctuations in accesses to a website in association with the access log. Further, a system that estimates event type from the log accumulated in a web server does not exist conventionally. Thus, even if the access log for the website can be obtained, it is quite often the case that performance data showing fluctuations in traffic cannot be utilized in forecasting in the case where the corresponding event implementation history cannot be obtained.
On the other hand, there already exist numerous anomaly detection methods that analyze the log and detect access fluctuation anomalies (e.g., see non-patent document 1 below). Non-patent document 1 describes three anomaly detection methods. These methods stop at detecting anomalous fluctuations in traffic, and do not identify the cause of anomalies.
Non-patent document 1: Yamanishi, K., Takeuchi, J., Maruyama, Y., “Three Methods of Statistical Anomaly Detection” (in Japanese), IPSJ Magazine, vol. 46, no. 1, pp. 34-40, published on Jan. 15, 2005.
Systems that forecast shifts in anomalous values in addition to detecting anomalies have also been disclosed (e.g., see JP 2005-196675A). JP 2005-196675A describes a process that involves calculating anomalous values for the number of recorded events from the log for a network device or the like, and forecasting subsequent shifts in the anomalous values based on Bayesian inference. An “event” in JP 2005-196675A is a parameter in the log containing a specific item, such as HTTP port probe or Smurf attack, for example. This is different from an event in the present invention. An event in the present invention indicates the cause of a characteristic fluctuation in traffic represented in the log data (e.g., promotions, website advertising, TV commercials, street campaigns). Despite the same word “event” being used, JP 2005-196675A does not describe a process for estimating the type of event that causes a change in user behavior.
Conventionally, there have been numerous commercial products that analyze website logs (e.g., see non-patent document 2 below). Non-patent document 2 introduces log analysis tools such as Urchin and SiteCatalyst. These log analysis tools function to aggregate and visualize changes in the number of visitors due to advertising, the probability of users who view certain pages making a purchase, transitions in traffic over time, and so forth. However, these log analysis tools do not have means for extracting the type of events conducted in the past from a log.
Non-patent document 2: “Access Log Analysis Tools” (in Japanese), iNTERNET magazine, Impress Corporation, published on Dec. 1, 2005, December 05 issue, p. 106, 2005.