Scientific, engineering, and business data (or more generally, information) are often multi-dimensional, i.e., they depend on several variables. Information presented in matrix form, in which typically the rows correspond to the “data points” and the columns correspond to the variables, may contain more than three columns of information. Representing and making sense of such multi-dimensional information is challenging, at least in part because of three-dimensional space in which we live. For example, reducing a four dimensional data space to a two or even three dimensional representation necessarily involves introducing some ambiguities in how the data is presented: In a two-dimensional projection of a three-dimensional scatter plot, a point may correspond to any value parallel to the viewing axis. It has been a long standing goal of the information science community to relieve the dimensionality curse on knowledge discovery through simple information representations derived from familiar and easy to understand lower dimensional representations, and to do this in a way that does not sacrifice understanding of the information.
Visualization techniques are powerful information management tools that support knowledge workers in their decision-making activities by stimulating visual thinking. One of the goals of visualization techniques is to support knowledge workers in the early stages of their information-understanding tasks. The techniques generally involve some sort of graphical representation that facilitates qualitative rather than quantitative analysis. That is, the purpose of such techniques is generally to gain insight into the distribution of the information, such as exploring interesting trends and patterns or “structure” in the information. It is assumed that once the user has a better overall understanding of the information, he or she will be able to glean numerical details during later stages of the knowledge discovery process. Thus, it is expected that users of visualization techniques are likely to tolerate loss of exact information in the initial stages of their analysis by trading certainty for insight.
The use of simple visual representations, in which all dimensions are represented, may be crucial in the early stages of information analysis, as users are confronted with the need to explore and compare a number of different options rapidly. However, if the representations are too complex and involve too many visual cues such as color, size, direction, and position, the user may be overloaded, since the user does not yet have a good understanding of his or her information. In addition, the use of multiple encodings makes it difficult to compare trends and clusters and understand information distribution in higher dimensional spaces.
One such attempt to reduce multi-dimensional information to a more manageable two-dimensional presentation is Bertin's Permutation Matrices (see J. Bertin, Graphics and Graphic Information Processing, Walter de Gruyer & Co., Berlin, pp. 24-31, 1981), which allows users to rearrange rows and columns to discover patterns and clusters from coarse graphical depictions of information. In Permutation Matrices, data in each cell are represented using simple visuals such as black or white colored cells. Using this simple but coarse technique, it is possible to grasp the distribution of data and clusters without the need for exact data values. Users can tolerate loss of information for the sake of gaining insight into the data. The strength of the Permutation Matrices also lies in the interactivity that enables users to integrate and separate dimensions in the visual representation.
Another such technique is the use of so-called Parallel Coordinates (see A. Inselberg and B. Dimsdale, “Parallel coordinates: a tool for visualizing multi-dimensional geometry”, Proceedings of the First IEEE Conference on Visualization, 1990), in which parallel lines are laid out, and each dimension is encoded uniformly through the same visual cue (i.e., position). In Parallel Coordinates each data element is represented as a line passing through a coordinate axis at the value of the element for that dimension. While Parallel Coordinates is a very powerful technique—especially for modeling relationships—its visualizations generally rely on user expertise and knowledge of mathematical methods. Other techniques include Chernoff's use of faces to represent multi-dimensional data (see H. Chernoff, “The use of faces to represent points in k-Dimensional space graphically”, Journal of American Statistical Association, 68, pp. 361-368, 1973), and Friedman's real-time motion graphics that gives pictures that appear to be moving, three dimensional objects (see J. H. Friedman and J. W. Tukey, “A projection pursuit algorithm for exploratory data analysis”, IEEE Transactions on Computers, vol. C-23, no. 9, pp. 881-890, 1974). In spite of the advantages offered by these techniques, there is a still a need, however, for an information management technique that is more user friendly by efficiently facilitating user interaction.