Computer-based data visualization involves the generation and presentation of idealized data on a physical output device, such as a cathode ray tube (CRT), liquid crystal diode (LCD) display, printer and the like. Computer systems visualize data through graphical user interfaces (GUIs), which allow intuitive user interaction and high quality presentation of synthesized information.
The importance of effective data visualization has grown in step with advances in computational resources. Faster processors and larger memory sizes have enabled the application of complex visualization techniques to operate in multi-dimensional concept space. As well, the interconnectivity provided by networks, including intranetworks and internetworks, such as the Internet, enable the communication of large volumes of information to a wide-ranging audience. Effective data visualization techniques are needed to interpret information and model content interpretation.
The use of a visualization language can enhance the effectiveness of data visualization by communicating words, images and shapes as a single, integrated unit. Visualization languages help bridge the gap between the natural perception of a physical environment and the artificial modeling of information within the constraints of a computer system. As raw information cannot always be digested as written words, data visualization attempts to complement and, in some instances, supplant the written word for a more intuitive visual presentation drawing on natural cognitive skills.
Effective data visualization is constrained by the physical limits of computer display systems. Two-dimensional and three-dimensional information can be readily displayed. However, n-dimensional information in excess of three dimensions must be artificially compressed. Careful use of color, shape and temporal attributes can simulate multiple dimensions, but comprehension and usability become difficult as additional layers of modeling are artificially grafted into the finite bounds of display capabilities.
Thus, mapping multi-dimensional information into a two- or three-dimensional space presents a problem. Physical displays are practically limited to three dimensions. Compressing multi-dimensional information into three dimensions can mislead, for instance, the viewer through an erroneous interpretation of spatial relationships between individual display objects. Other factors further complicate the interpretation and perception of visualized data, based on the Gestalt principles of proximity, similarity, closed region, connectedness, good continuation, and closure, such as described in R. E. Horn, “Visual Language: Global Communication for the 21st Century,” Ch. 3, Macro VU Press (1998), the disclosure of which is incorporated by reference.
In particular, the misperception of visualized data can cause a misinterpretation of, for instance, dependent variables as independent and independent variables as dependent. This type of problem occurs, for example, when visualizing clustered data, which presents discrete groupings of data, which are misperceived as being overlaid or overlapping due to the spatial limitations of a three-dimensional space.
Consider, for example, a group of clusters, each cluster visualized in the form of a circle defining a center and a fixed radius. Each cluster is located some distance from a common origin along a vector measured at a fixed angle from a common axis through the common origin. The radii and distances are independent variables relative to the other clusters and the radius is an independent variable relative to the common origin. In this example, each cluster represents a grouping of points corresponding to objects sharing a common set of traits. The radius of the cluster reflects the relative number of objects contained in the grouping. Clusters located along the same vector are similar in theme as are those clusters located on vectors having a small cosine rotation from each other. Thus, the angle relative to a common axis' distance from a common origin is an independent variable with a correlation between the distance and angle reflecting relative similarity of theme. Each radius is an independent variable representative of volume. When displayed, the overlaying or overlapping of clusters could mislead the viewer into perceiving data dependencies where there are none.
Therefore, there is a need for an approach to presenting arbitrarily dimensioned data in a finite-dimensioned display space while preserving independent data relationships. Preferably, such an approach would organize the data according to theme and place thematically-related clusters into linear spatial arrangements to maximize the number of relationships depicted.
There is a further need for an approach to selecting and orienting data clusters to properly visualize independent and dependent variables while compressing thematic relationships for display.