The present invention relates to analysis of large data volumes, and in particular, to systems and methods for visualizing large data volumes utilizing an initial sampling and a multi-stage calculation.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
With the evolution in sophistication and complexity of databases, stored data is available for visualization and analysis in increasingly large volumes. Such “big data” may comprise millions or even billions of different records.
In order to assimilate such large amounts of data, big data platforms generally sacrifice the ability to perform complex analytical functions. Instead, their query expressivity functionality may be limited to relatively simple operations. These functions may not afford a user with valuable insight into trends and other relationships that are masked beneath the sheer volume of available data.
Apart from exhibiting limited querying capability, conventional big data platforms may also suffer from slow querying. Specifically, many potential applications call for a user to engage in interactive querying in order to produce desired visualization with the data. This typically involves the user creating and changing visualizations of the data multiple times, in an iterative manner.
Effectively performing interactive visualization, however, generally requires a response time on the order of seconds (e.g., 1-5 seconds). However, conventional big data platforms generally operate too slowly to allow this type of interactive visualization activity.