The coming of the digital age was akin to the breaching of a dam: a torrent of information was unleashed and we are now awash in an ever-rising tide of data. Information, results, measurements and calculations—data, in general—are now in abundance and are readily accessible, in infinitely reusable digital form, on magnetic or optical media. The relentless increase in computing power fuels the promise of being able to efficiently analyze and display vast amounts of data more quickly and in ever more creative ways. Accordingly, the ever-present need to make meaningful sense of data is driving substantial research efforts in methods of statistical analysis, pattern recognition, data mining, and visualization. Current challenges include the ability to provide fast ways of coping with data that exists within a complex parameter space.
Data is more than the numbers, values, or predicates of which it is comprised. Data resides in multi-dimensional spaces which harbor rich and variegated landscapes that are not only strange and convoluted, but are not readily comprehendible by the human brain. The most complicated data arises from measurements or calculations that depend on many apparently independent variables. Data sets with hundreds of variables arise today in many walks of life, including: gene expression data for uncovering the link between the genome and the various proteins for which it codes; demographic and consumer profiling data for capturing underlying sociological and economic trends; sales and marketing data for huge numbers of products in vast and ever-changing marketplaces; and environmental measurements for understanding phenomena such as pollution, meteorological changes and resource impact issues. International research projects such as the Human Genome Project and the Sloan Digital Sky Survey are also generating massive scientific databases. Furthermore, corporations are creating large data warehouses of historical data on key aspects of their operations. Corporations are also using desktop applications to create many small databases for examining some specific aspect of their business.
One challenge with any of these databases is the extraction of meaning from the data they contain: to discover structure, find patterns, and derive causal relationships. Often, the sheer size of these data sets complicates this task and means that interactive calculations that require visiting each record are not plausible. It may also be infeasible for an analyst to reason about or view the entire data set at its finest level of detail. Even when the data sets are small, however, their complexity often makes it difficult to glean meaning without aggregating the data or creating simplifying summaries.
Among the principal operations that may be carried out on data, such as regression, clustering, summarization, dependency modelling, and classification, the ability to see patterns rapidly is of paramount importance. Data comes in many forms, and the most appropriate way to display one form is not the best for another. In the past, where it has been recognized that many methods of display are possible, it has been a painstaking exercise to select the most appropriate one. However, identifying the most telling methods of display can be intimately connected to identifying the underlying structure of the data itself.
Business intelligence is one rapidly growing area that benefits considerably from tools for interactive visualization of multi-dimensional databases. A number of approaches to visualizing such information are known in the art. However, although software programs that implement such approaches are useful, they are often unsatisfactory. Such programs have interfaces that require the user to select the most appropriate way to display the information.
Visualization is a powerful tool for exploring large data, both by itself and coupled with data mining algorithms. However, the task of effectively visualizing large databases imposes significant demands on the human-computer interface to the visualization system. The exploratory process is one of hypothesis, experiment, and discovery. The path of exploration is unpredictable, and analysts need to be able to easily change both the data being displayed and its visual representation. Furthermore, the analyst should be able to first reason about the data at a high level of abstraction, and then rapidly drill down to explore data of interest at a greater level of detail. Thus, a good interface both exposes the underlying hierarchical structure of the data and supports rapid refinement of the visualization.
In addition to various software programs, the known art further provides formal graphical presentations. Bertin's Semiology of Graphics, University of Wisconsin Press, Madison Wis., (1983), is an early attempt at formalizing graphic techniques. Bertin developed a vocabulary for describing data and techniques for encoding the data into a graphic. Bertin identified retinal variables (position, color, size, etc.) in which data can be encoded. Cleveland (The Elements of Graphing Data, Wadsworth Advanced Books and Software, (1985), Pacific Grove, Calif.; and Visualizing Data, (1993), Hobart Press) used theoretical and experimental results to determine how well people can use these different retinal properties to compare quantitative variations.
Mackinlay's APT system (ACM Trans. Graphics, 5, 110-141, (1986)) was one of the first applications of formal graphical specifications to computer generated displays. APT uses a graphical language and a hierarchy of composition rules that are searched through in order to generate two-dimensional displays of relational data. The Sage system (Roth, et al., (1994), Proc. SIGCHI '94, 112-117) extends the concepts of APT, providing a richer set of data characterizations and generating a wider range of displays.
A drawback with the formal graphical specifications of the art is that they do not provide a user with a means to control or influence the results. APT, for example, assumes a given database structure and generates a graphic with no user involvement or support for user involvement and also requires searching through a number of possibilities before deducing that which it considers to be most appropriate. Accordingly, such formal graphical specifications do not provide a satisfactory way to analyze databases.
Visual query tools such as VQE (Merthick et al., 1997, “An Interactive Visualization Environment for Data Exploration,” Proc. of Knowledge Discovery in Databases, p. 2-9), Visage (Roth et al. 1996, “Visage: A User Interface Environment for Exploring Information” in Proceedings of Information Visualization, p. 3-12), DEVise (Livny et al., 1997, “DEVise: Integrated Querying and Visual Exploration of Large Datasets” in Proc. of ACM SIGMOD), and Tioga-2 (Woodruff et al., 2001, Journal of Visual Languages and Computing, Special Issue on Visual Languages for End-user and Domain-Specific Programming 12, p. 551-571) have focused on building visualization tools that directly support interactive database exploration through visual queries. Users can construct queries and visualizations directly through their interactions with the interface. These systems have flexible mechanisms for mapping query results to graphs and support mapping database tuples to retinal properties of the marks in the graphs. However, these visual query tools do not provide a user with particularly significant labor savings in deciding how best to display data rapidly.
Based on the background state of the art, as described herein, what is needed are improved methods and improved graphical interfaces for visualizing data, including data that has a hierarchical structure.