The present invention relates to a system and method for monitoring and analyzing computer network transaction data to create behavior profiles of network users. More particularly, the present invention relates to a method and system of manually or automatically classifying information available on a computer network. Specifically, the present invention helps classify Internet Web sites to facilitate the construction of more accurate behavior profiles of Internet users for marketing purposes.
In the current Internet world, it has become desirable for service providers and merchants to obtain specific information about Internet users for the purpose of improving the marketing of products and services, measuring the effectiveness of marketing, and tailoring the products and services to meet the requirements of specific customer types.
Behavior profiles are created using network usage data collected through various methods. Once the data is collected, it is analyzed to determine the behavior of a particular user. In order to create an accurate behavior profile, it is useful to generalize Internet usage by identifying the types of Web sites a particular type of user accesses and the way that type of user accesses a particular type of Web site.
For example, it would be valuable to a merchant to know that users from a geographical area regularly purchase books from Amazon.com™; however, there is a need for more generalized data. It is desirable to have a system that can create generalized behavior profiles. It is valuable information to know that users in a particular geographical area regularly conduct electronic commerce by accessing online catalog and shopping sites by following links on a Web portal site.
To provide a system for creating generalized behavior profiles, it is desirable to have a method and system for classifying Web sites using a classification of sufficient granularity to allow meaningful analysis of network transaction data.
Manual classification by users can lead to inconsistent results due to differing understandings of categories within a classification system, differing opinions of the purpose and use of a site, etc. It is desirable to have a method and system that provides a more consistent categorization of information. Additionally, it is desirable to provide a system and method such that inexperienced classifiers can perform the bulk of classification without sacrificing accuracy.
Also, there is a need for an automatic classification system that can quickly and accurately categorize information repositories accessible on a computer network. An automatic classification system can operate more quickly and at less expense than a manual classification system; however, the automatic classification system may not be as accurate as a manual classification system.
Finally, there is a need for a hybrid classification system that uses both manual and automatic classification components to provide increased performance and accuracy.