Techniques for processing data streams have gained importance in recent years because of the great ease with which stream data can be collected. That is, hardware technology advances have made it easy to automatically record data associated with transactions and activities in everyday life. By way of example only, such data may be collected in the context of retail sales applications, multimedia applications, telecommunication applications, etc. It is also known that such data often has a very high dimensionality associated therewith. Data sets which have inherently high dimensionality may include, by way of example only, demographic data sets in which the dimensions comprise information such as the name, age, salary, and numerous other features which characterize a person.
The ubiquitous presence of data streams in a number of practical domains (e.g., retail sales, multimedia, telecommunications, as mentioned by way of example above) has generated much research, particularly, in the areas of clustering and classification of stream data. The clustering problem is especially interesting for the data stream domain because of its application to data summarization and outlier detection. Examples of such research are disclosed in R. Agrawal et al., “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” ACM SIGMOD Conference, 1998; C. C. Aggarwal et al, “Fast Algorithms for Projected Clustering,” ACM SIGMOD Conference, 1999; C. C. Aggarwal et al., “A Framework for Clustering Evolving Data Streams,” VLDB Conference, 2003; and C. C. Aggarwal et al., “A Framework for High Dimensional Projected Clustering of Data Streams,” VLDB Conference, 2004, the disclosures of which are incorporated by reference herein.