In various industries, it is increasingly important to obtain representative data, summarize data and/or to determine data trends. This often must be done given the constraints of the existing physical structure of the particular system that generates or receives the data of interest. In addition, this often must be done in a relatively fast amount of time and without burdening the system with respect to memory, processing power, or the like. For example, in the telecommunications industry, it might be of interest to obtain data on call traffic through an area of the network to observe load. However, in doing so, it is important not to usurp the network of its memory and processing abilities for routing call traffic.
There are several tools that can be used to obtain the desired data output. For example, histograms are succinct and space-efficient approximations of distributions of numerical values. Histograms are among the simplest class of data representations. They are easy to visualize and implement statistical analyses. Histograms find many applications in computer systems. For example, most commercial database engines keep a histogram of the various value distributions in a database for optimizing query executions and for approximately processing queries, image processing systems handle color histograms, etc.
In addition to histograms, wavelets can also be used to obtain a desired data synopsis. Wavelets are mathematical functions that divide data into different frequencies and enable the study or manipulation of each frequency component with a particular resolution. Wavelets are used in a variety of applications including image compression, turbulence, human vision, radar, and earth quake prediction.
Histogram and wavelet approximations are compact, i.e., they do not consume a significant amount of memory or processing energy. Although the data representations provided by histograms and wavelets are not exact representations, the data representation is sufficient for most trend analysis.
The present application may be implemented in connection with distributed and dynamic data sources associated with large scale networks. For example, network routers generate a data stream of logs of the traffic that flows through the network. In order to conduct real time traffic control, network operators must know traffic patterns at various routers at any given moment. However, it is prohibitively bandwidth-expensive to transfer data streams of traffic logs from routers to central monitoring stations on a continuous basis. Compact data representations are less bandwidth-expensive.
Space-efficient data representations are also needed in other areas such as the financial industry. Stock transactions continually occur throughout the day and each transaction changes the underlying data distribution. In other words, the volume of shares sold per stock can fluctuate every minute. These transactions are stored in databases in a variety of locations. There is a need to maintain data representations in real time in transactional databases given these rapid data changes.
Prior histogram work has not been able to handle both the positive and negative data updates to perform certain types of distributed data calculations.
Given the foregoing, there is a need in the industry to provide real-time data from distributed databases in a manner that consumes a feasible amount of bandwidth, memory, and processing power, in an accurate and timely manner. This need is especially great where for dynamic data distributions, i.e., where the data changes rapidly.