1. Field of the Invention
The present invention relates to web mining technology, and more particularly, to a method and apparatus of traversal pattern mining with reference to predefined minimum support value corresponding to web object position.
2. Description of the Related Art
With the rapid expansion of the World Wide Web (WWW), web data mining has recently become increasingly important. An important issue in web data mining is traversal pattern mining used to decide upcoming likely web page requests based on significant statistical correlations. Web log data is collected by web servers, containing information about user behavior on a site (e.g., sequences of URLs requested by different clients bearing different IP address.). The analysis of these large volumes of log data requires employment of data mining methods. According to the definition of association mining rules, mined patterns are those access sequences of frequent occurrence. If a sequence appears frequently enough, the sequence indicates a frequent traversal pattern. Understanding user traversal patterns not only helps improve the Web site design, such as providing efficient access between highly correlated objects, better authoring design for pages, and the like, but also lead to better marketing decisions, such as advertisement placement, more accurate customer classification and behavior analysis, and the like.
Although conventional methods described are feasible for the mining of frequent traversal patterns from a log file, several problems remain. Specifically, conventional methods of traversal pattern mining are based on the model of a uniform support threshold to determine frequent traversal patterns without considering such important factors as the length of the pattern and the positions of web pages. As a result, a low support threshold leads to generation of unimportant patterns while a high support threshold may cause important patterns with lower support to be ignored.
In view of these limitations, a need exists for an apparatus and method of traversal pattern mining, with reduced process time and improved usability of results.