1. Technical Field
The present invention is directed to apparatus and methods for classifying web sites. More specifically, the present invention is directed to apparatus and methods for profiling web sites, clustering web sites, and classifying web sites based on the profiling and clustering.
2. Description of Related Art
The problems of workload characterization, performance modeling, workload and performance forecasting, and capacity planning are fundamental to growth of web services and applications. That is, the ability to characterize the workload that is experienced by a web server is crucial to devising ways to handle the workload. Typically, such workload characterization has been “after the fact” in that it is performed as a mechanism for determining how to compensate for workload already experienced. Thus, the known mechanisms for workload characterization is limited to the workload previously experienced by a particular web server.
Such workload characterization focuses on the complexity of web traffic at the level of object-hits or page-views. Such characterization does not take into account higher-level characteristics such as the traffic that a web site experiences over a period of time. Moreover such characterization does not take into account the similarity of traffic patterns experienced by a plurality of web sites. As a result, the characterization of the known systems does not provide insight into the traffic that a web site is likely to experience in the future or mechanisms for handling such traffic as determined by the similarities with other web sites.
Thus, it would be beneficial to have an improved apparatus and method for classifying web sites based on their traffic patterns and similarities with other web site traffic patterns.