The present invention relates generally to systems and methods for signal processing. More specifically, the present invention relates to a system and method for estimating frequency moments in data stream processing algorithms.
In many signal processing application, the signal must be processed in a few or, often, just one pass. For example, a “data stream” is often referred to as a sequence of data that is too large to be stored, in its entirety, in memory. Such data streams are common in communications network traffic, database transactions, satellite data feeds, and the like. In such instances, “streaming algorithms” are used to process these signals as data streams forming an input presented as a sequence of items that can be examined in very few passes. A common example of a streaming algorithm is one developed to count the number of distinct elements in a data stream. The continuous nature of the underlying signal and the resource constraints that limit the amount of repetitive processing performed general result in the algorithm producing an approximate answer based on a summary of the data stream that is stored.
Alon, Matias, and Szegedy, in The Space Complexity of Approximating the Frequency Moments, Journal of Computer and System Sciences, 58:137-147, 1999, which is incorporated herein by reference in its entirety, approached such a signal processing problem, within the context of database processing, and introduced the concept of “frequency moments.” Namely, for a sequence of elements, D={p1, p2, . . . , pm} of size m of numbers from {1, . . . , n}, a frequency of an element, I, is defined as:fi=|{j:pj=i}|  Eqn. 1.
The k-th frequency moment of D is defined as:
                              F          k                =                              ∑                          i              =              1                        n                    ⁢                                    m              i              k                        .                                              Eqn        .                                  ⁢        2            
Alon, Matias, and Szegedy, when approaching the problem of approximating frequency moments in one pass over D and using sublinear space, observed a striking difference between “small” and “large” values of k. Specifically, it is possible to approximate Fk for k≦2 in polylogarithmic space. However, polynomial space is required when k>2. Since the work of Alon, Matias, and Szegedy in the late 1990's, approximating Fk has become one of the most inspiring problems in the theory of data streams.
For example, many have focused on efficient algorithms for estimating particular moments, such as F2, which is useful for computing statistical properties of the data. Others have focused on bounding the memory required of Fk approximation algorithms. For example, many proposed solutions or bounds having accuracy up to a polylogarithmic factor. However, as noted above, since a polynomial space is required when k>2, suitably efficient approximations for frequency moments for k≧3 have been lacking.
It would therefore be desirable to provide a system and method for approximating frequency moments with a reduced space complexity than traditional dictated for k≧3.