Multi-dimensional and temporal data are very common in many business and scientific domains. Nowadays, large-scale data is growing at explosive speed everywhere. While the capacity to collect and store new data grows rapidly, the ability to analyze these data volumes has been increasing at much lower rates. Although it is not easy to analyze large-scale complex data, it is useful to gain insight into patterns, trends and correlations. This means there is greater potential value in large-scale complex data.
In the past, researchers have proposed many automatic analysis methods to process and simplify these data, such as Principle Component Analysis, Self-Organizing Map and clustering algorithms. Another typical way to handle large-scale data is on-line analytical processing (OLAP) that partially pre-aggregates data into “cubes” and stores the data in a “data warehouse.” However, traditional analysis systems do not offer an effective and flexible way to present the data or provide intuitive analysis results.
Traditional visualization methods, such as two-dimensional (2D)-lines, bars and tables, are simple and intuitive, thus being widely used in our daily life. However, they have limitations when presenting large-scale complicated data. For example, FIG. 1 at 100 and 101 show visualization of ten years of data for China's mutual fund market. It is difficult for analysts to identify patterns and trends in the 2D table at 100, which only contains numbers in each cell. The 2D bars at 101 are more suitable to show past and future trends. However, the 2D bars at 101 are not suitable to display additional detailed information, for example, additional information on certain types of mutual fund.
In the last few decades, other visualization methods have been proposed, such as scatter plots, heat maps, parallel coordinates, tree-maps, Internet map, spiral graphs, density-based distribution maps, sphere-based maps, Theme-River, and TimeWheel. These methods are useful when visualizing certain types of data. For example, a heat map can be applied to depict financial information. A spiral graph is a visualization method to discover the periodic pattern of influenza cases. Parallel coordinates are a common way of visualizing high-dimensional geometry and analyzing multivariate data. The “Cross-Filtered Views” method can visualize and analyze multi-dimensional data. This method interactively expresses sequences of multidimensional set queries by cross-filtering data values across pairs of views. Other proposed methods for visually analyzing temporal data are based on three main criteria: time, data, and representation.
In recent years, Visual Analytics (VA) has been introduced to represent massive, multi-dimensional and temporal data with various visual encodings. Visual Analytics can illustrate patterns, trends and correlations of data in the shortest amount of time in the smallest amount of space. Advantages of visual analytics include the simplicity of design and the ability to analyze data of high complexity. Visual analytics combines automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making on the basis of very large and complex datasets. Visual analytics can be applied to many domains, such as finance, health, geography, physics, security, etc.
Visual analytics systems and tools can support domain analysts' decision-making and discoveries of insights through advanced visualization methods and system user-centric interactive operations. One popular commercial tool for visualizing business data is Microsoft Excel, which provides the standard visualization methods for spreadsheet data (bar, column, line, pie, etc.). However, these visualization methods become restrained when the underlying data model consists of complex ideas that need to be communicated with clarity, precision, and efficiency.
A large variety of companies, ranging from specialized data discovery vendors such as Tableau, QlikTech, and Spotfire, to multinational corporations such as IBM, Microsoft, Oracle and SAP, have engaged in efforts to develop their own commercial visual analytics systems for analyzing voluminous data of increasing variety. Large software vendors tend to focus on only a small number of “standard” visualization techniques, such as line charts and tables, which have limited capability in handling large complex data. Existing toolkits for analyzing data include for example, InfoVis Toolkit by SenchaLabs, Redwood City, Calif., Prefuse by An Open Source Foundation, SourForge.net, under BSD license, and Protovis by Stanford Visualization Group, under BSD License.
On the academic side, a number of VA systems have been developed to support domain analysts' work. MobiVis by Visualization & Interface Design Innovation (VIDi) University of California, Davis is a system to visually analyze mobile data by presenting social and spatial information in one heterogeneous network. VIS-STAMP by (Spatial Data Mining and Visual Analytics Lab,) Department of Geography, University of South Carolina is a visual inquiry system for space-time and multivariate patterns. VIS-STAMP supports different complex patterns and, through a variety of interactions, enables system users to focus on specific patterns and examine detailed views of data. WireVis by Bank of America and UNC Charlotte is a system that combines multiple visualization methods to analyze categorical, time-varying data from financial transactions. Weijia Xu et al. designed a system based on a tree-map to analyze large digital collections with interactive visualization. Hotmap is based on a heat map to visualize geography data, using the structure of the underlying data set to visualize it in its own space.
Many researchers have been involved in efforts in developing high-level models of VA system design. More specifically, Tamara Munzner proposed a nested model for the visualization design and validation with four layers: characterize the task and data in the vocabulary of the problem domain, abstract into operations and data types, design visual encoding and interaction techniques, and create algorithms to execute techniques efficiently. Based on that model, Xiaoyu Wang et al. proposed a two-stage framework for designing visual analytics system in organizational environments.
However, the above-described visual analytics tools provide a flat representation of voluminous data and cannot properly address multi-dimensional data. The existing tools have difficulties depicting temporal information of data. These conventional approaches for temporal representation include, for example, heat maps or geometries of rectangles; however, these visualizations are not intuitive and do not convey properly the temporal element of data for the system user to gain insight and trends in large complex data sets.