Aggregation along hierarchies is a critical data summarization technique in a large variety of online applications, including decision support (e.g, online analytical processing (OLAP)), network management (e.g., internet protocol (IP) clustering, denial-of-service (DoS) attack monitoring), text (e.g., on prefixes of strings occurring in the text), and extensible markup language (XML) summarization (i.e., on prefixes of root-to-leaf paths in an XML data tree). In such applications, data is inherently hierarchical and it is desirable to monitor and maintain aggregates of the data at different levels of the hierarchy over time in a dynamic fashion.
A heavy hitter (HH) is an element of a data set having a frequency which is greater than or equal to a user-defined threshold. A conventional algorithm for identifying the HHs in the data set maintains a summary structure which allows the frequencies of the elements to be estimated within a pre-defined error bound. The conventional HH algorithm, however, did not account for any hierarchy in the data set. It is also possible to store information for each node in a hierarchy and calculate HHs based on this information. However, the storing of data for all nodes and the amount of calculation is prohibitive. In addition, this method provides superfluous results. A need exists for identifying hierarchical heavy hitters (“HHHs”) in data sets having multiple dimensions.