1. Field of the Invention
The present invention relates to computers and computer networks. More particularly, the invention relates to profiling network activities.
2. Background of the Related Art
Clustering refers to a partitioning method of the data points such that data points in the same cluster are similar to each other and the data points in different clusters are dissimilar. Simultaneously clustering columns and rows of large data matrix is referred to as co-clustering. Throughout this document, the terms “cluster” and “co-cluster” (or “clustering” and “co-clustering”) may be used interchangeably.
Co-clustering techniques may be applied in a wide range of applications, such as document mining, micro-array analysis, and recommendation systems. For a data matrix of m rows and n columns, the time complexity of existing co-clustering, methods (e.g., information-theoretic co-clustering algorithm, matrix-decomposition based spectral clustering method, etc.) is usually in the order of m*n or higher. This limits applicability to data matrices involving a large number of columns and rows. Moreover, existing co-clustering methods requires that the entire data matrix needs to be held in the main memory during the entire co-clustering process. Other strong limitations of existing co-clustering methods includes requiring that the number of clusters in which the data-set is to be partitioned as a pre-determined parameter and that columns or rows of the data matrix cannot be shared across different clusters (referred to as hard co-clustering). It is often unrealistic in real-time applications to pre-determine the number of partitioned clusters because the hidden relationships within the data-set may not be known ahead of time.