1. Field of the Invention
The present invention relates to computer-implemented methods of mining path traversal patterns in a communications network.
2. Related Art
Due to the increasing use of computing for various applications, the importance of database mining is growing rapidly. For example, progress in barcode technology has enabled retail organizations to collect and store massive amounts of sales data. Catalog companies can also collect sales data on the orders they receive. Analysis or mining of past transaction data can provide very valuable information on customer buying behavior, and thus improve the quality of business decisions such as: what to put on sale; which merchandise should be placed on shelves together; and how to customize marketing programs, to name a few. It is essential however, to collect a sufficient amount of sales data before any meaningful conclusions can be drawn therefrom. It is hence important to devise efficient methods to conduct mining on these large databases.
Note that various data mining capabilities have been explored in the literature. Data mining is a broad field with many application-dependent problems requiring different mining techniques to solve. One of the most important data mining problems is mining association rules. Mining association rules means that, given a database of sales transactions, it is desirable to discover all associations among items in a transaction such that the presence of some items will imply the presence of other items in the same transaction. Another application, called mining classification rules, refers to developing rules to group data tuples together based on certain common features. Yet another source of data mining is ordered data, such as stock market and point of sales data. Examples of data mining applications on ordered data include searching for similar sequences, e.g., stocks with similar movement in stock prices, and sequential patterns, e.g., grocery items bought over a series of visits in sequence. From the above examples it can be appreciated that the application-dependent nature of data mining requires proper problem identification and formulation as a prerequisite to the knowledge discovery process.
Another data mining application, involving mining access patterns in communications networks, is the focus of this invention. In one type of network considered, documents or objects are linked together to facilitate interactive information access. Examples of such information networks include the World Wide Web (WWW) and on-line services, such as those using the trademarks PRODIGY, COMPUSERVE and AMERICA ONLINE, where users, seeking information of interest, travel from one object to another via the facilities (e.g., links and/or icons in a graphical user interface) provided. Understanding user access patterns in such environments will not only help improve system design and usability, (e.g., providing efficient access between highly correlated objects, better authoring design for pages, etc.) but also lead to better marketing decisions (e.g., putting advertisements in "high traffic" areas, better customer/user classification and behavior analysis, etc.). Capturing user access patterns in such environments is referred to as mining traversal patterns.
It is important to note that since users are searching the information network for information or "surfing the net", some objects are visited because of their location rather than their content. This highlights a difference between the traversal patterns problem and other data mining problems which are mainly based on customer transactions. This unique feature of the traversal pattern application necessarily increases the difficulty of extracting meaningful information from a sequence of traversal data. However, as these information services are becoming increasingly popular nowadays, there is a growing demand for analysis of user behavior to improve the quality and cost-effectiveness of such services.