The exemplary embodiment relates to a method and system for the display of multi-dimensional data. It finds particular application in connection with dynamically determining and presenting appearance and spatial attribute values of entities of the multi-dimensional data over a sequence to assist in the recognition of patterns and trends within the data.
Data visualization and analysis is a difficult task when the dimensionality of data is high and the shape of clusters is complex. Recently, a number of data visualization methods have been proposed. Most of the methods are only applicable to a simple dataset, for example smooth manifolds or clustered data-points. Standard data analysis tools use mathematical operations to reduce the dimensionality of the data to a more manageable dimensionality (e.g., multidimensional scaling (MDS), principal component analysis (PCA), cluster analysis, projection pursuit methods, neural network algorithms, and the like) or for transforming data for visualization. These methods provide a static view of the data, which is efficient for a simple dataset.
In some cases involving large, noisy or non-linear datasets, static visualization methods are not able to give an intuitive understanding of the spatial organization of the data. One approach is to use a dynamic visualization method, that is to say a system that outputs an animation consisting of a series of smoothly changing projections of a data-point cloud that encompasses all data-points. With such settings, visualization is similar to watching a movie, and thus makes use of time as an additional dimension. However, this dimension is very specific and requires a dedicated method to be really understood by users. A standard method for dynamic visualization is known as the Grand Tour. In the Grand Tour, sequences of 2D or 3D projections are displayed (See, ASIMOV, D. The grand tour; a tool for viewing multidimensional data. SIAM. Journal of Science and Statistical Computing, 6(1):128-143, January 1985). Instead of choosing an arbitrary projection to visualize the data, every possible projection is approximately visualized using multiple images in a movie-like animation. A space-filling curve is used in traversing the projection space, i.e., a series of projections for which, for every possible projection, the series contains at least one element in a small neighborhood, and the sequence of projections is smooth, so that two contiguous projections in the series give similar images. In the classical implementation, a step and space-filling curve are defined, a plane is moved along this curve and the data projected. The user browses the animation using the time dimension scale by which the projections are indexed.
The series of projections in the Grand Tour does not depend on the data being visualized. However, viewing the huge space of 2D projections of a multidimensional dataset as a video can be prohibitively time consuming and not really informative when the number of dimensions is large. As a result, the Grand Tour is generally impractical for more than 10 dimensions.
Other dynamic visualization methods may involve a more an advanced framework that includes interaction with the user.
In order to reduce the huge search space for projection visualization, a projection pursuit guided tour has been proposed which combines Grand Tour and projection pursuit (See COOK, D., BUJA, A., CABRERA, J., AND HURLEY, H. Grand tour and projection pursuit, J. of Computational and Graphical Statistics 4, pp. 155-172 (1995)). The method of projection pursuit finds the projections that optimize a criterion called the projection pursuit index. This criterion should reveal the most details about the structure (clusters, surfaces, etc.) of the dataset (See FRIEDMAN, J., AND TUKEY, J. A projection pursuit algorithm for exploratory data analysis. In IEEE Transactions on Computers., pp. 881-890 (1974)). This combination is a useful visualization tool for some applications but does not allow a user to participate in the process.
Interaction techniques can empower the user's perception of information. A set of interaction techniques, such as aggregation, rotation, linking and brushing, interactive selection, and the like may improve the visualization process (See, DOS SANTOS, S. R., A framework for the visualization of multidimensional and multivariate data. Ph.D. Dissertation, University of Leeds, United Kingdom (2004)).
One approach uses the VISTA framework (CHEN, K., AND LIU, L. ivibrate: Interactive visualization-based framework for clustering large datasets. ACM Trans. Inf. Syst. 24, 2, pp. 245-294 (2006)). VISTA uses a star coordinates representation to manipulate dimensions available in the 2D view. (See, KANDOGAN, E. Visualizing multi-dimensional clusters, trends, and outliers using star coordinates. In KDD'OI: Proc. 7th ACM SIGKDD Intern, Conference on Knowledge Discovery and Data Mining pp. 107-116 (ACM Press, New York, N.Y., USA, 2001). This permits user interaction. However manipulating the parameters of the projection is not an intuitive way to explore the dataset and it is difficult when dimensionality is high.
Another approach is known as Targeted Projection Pursuit (TPP) (See, FAITH, J. Targeted projection pursuit for interactive exploration of high-dimensional data sets. In IV '07: Proceedings of the 11th International Conference Information Visualization, IEEE Computer Society, pp. 286-292 (Washington D.C., 2007) Unlike VISTA, The basis of TPP is that the user manipulates their view of the data directly, rather than manipulating the projection that produces that view. TPP is an interactive exploration tool where the user defines a target, and the system finds a projection that best approximates that target.
Both of these alternatives to the Grand Tour approach are relatively complex and require a highly trained user.