Computing systems store vast amounts of time-series data such as measurements from sensor networks or utility systems (e.g., smart grids), price and volume information from financial markets, statistics about sports terms, players, or employees, and so on. Often, time-series data is stored in a relational database or other database. In general, a database is an organized collection of data. A relational database, conceptually, can be organized as one or more tables, where a table is a two-dimensional structure with data values organized in rows and columns. A database management system (“DBMS”) mediates interactions between a database, users and applications in order to organize, create, update, capture, analyze and otherwise manage the data in the database.
In some scenarios, millions of readings from high-frequency sensors or outputs are stored in a database, to be accessed later using a visual data-analysis tool. A visual data-analysis tool (or other “visualization client”) presents time-series data stored in a database. Typically, a data analyst views and interacts with visualizations of the time-series data presented by the visualization client. The visualization client transforms user input from the analyst into queries issued to the DBMS managing the database that holds the time-series data. Such visual analysis of high-volume time-series data is common in many areas, including finance, banking, manufacturing, sports analytics, and utility monitoring. Depending on the volume of time-series data and number of users accessing the database, the volume of time-series data transferred from the DBMS to visualization clients can potentially be very large.
For example, suppose a database stores time-series data for measurements from sensors embedded in manufacturing machinery. The reporting frequency for a given embedded sensor is 100 Hz. Typically, an engineer accesses time-series data that spans the last 12 hours for a given sensor. A visualization client can use a non-aggregating structured query language (“SQL”) query to retrieve the requested data from the DBMS (e.g., SELECT time, value FROM sensor WHERE time>NOW( )−12*3600), where 12*3600 is the number of seconds in the last 12 hours. In common usage scenarios, 100 engineers from around the world may simultaneously access the database. In this case, the total amount of data to transfer is 100×(12×3600) seconds×100 Hz=432 million sensor measurement values, including 4.32 million values per data-analysis tool/engineer. Assuming a wire size of 60 bytes per value, the total amount of data that is transferred is over 24 GB, or over 247 MB per visualization client or engineer. An engineer will have to wait for this data to be loaded and processed by the visualization client before that engineer can examine a visualization for the time-series data for the sensor.
Many conventional visual data-analysis tools provide flexible and direct access to time-series data, but issue queries to a DBMS without considering the size, or cardinality, of the data set that is requested. This can lead to very high bandwidth consumption for query results between a data-analysis tool and DBMS, causing high latency between the issuance of queries and the return of the query results. Such delay hinders interactive visualization, especially for high-volume time-series data.
Even if a user has a network connection with sufficient bandwidth, the processing or memory requirements for high-volume time-series data can overwhelm a visual data-analysis tool. For example, a data-analysis tool may redundantly store copies of data as tool-internal objects, using a significant amount of system memory per record. This can cause long wait times for a user, leaving them with an unresponsive visualization client or even impairing an operating system or other software if system memory is exhausted.
Other systems lower the volume of data transferred from a DBMS to a visualization client by creating, at the DBMS side, an image or other graphical summary of requested time-series data, and transferring the image/graphic to the visualization client. This disregards the semantics of the visualization and typically results in degraded interactivity and/or visual errors at the visualization client.