The present disclosure relates generally to linear sketches, and more specifically, to identifying a sketching matrix used by a linear sketch.
Recent years have witnessed an explosion in the amount of available data, such as that in data warehouses, the Internet, sensor networks, and transaction logs. The need to process this data efficiently has led to the emergence of new fields, including compressed sensing, data stream algorithms and distributed functional monitoring. A common technique used across these fields is the use of linear sketches. Linear sketching involves specifying a distribution it over linear maps A: n→r for a value r<<n. A matrix A is sampled from π. Then a vector xεn is presented to the algorithm, which maintains the “sketch” Ax, which provides a concise summary of x, from which various queries about x can be approximately answered. The storage and number of linear measurements (rows of A) required is proportional to r. The goal of a linear sketch is to minimize r to well-approximate a large class of queries with high probability.
Linear sketches are powerful algorithmic tools that can be used for a wide variety of applications including norm estimation over data streams, compressed sensing, and distributed computing. Linear sketches turn an n-dimensional input into a concise lower-dimensional representation via a linear transformation. Linearity is required for performing updates or estimating statistics on differences of two datasets. Currently, linear sketches are widely used for answer number of distinct elements, top-k queries, histograms, etc, In almost any realistic setting, however, a linear sketch faces the possibility that its inputs are correlated with previous evaluations of the sketch.