Analysis of multivariate data is useful for predicting environmental conditions such as cyclones and more generally for identifying and/or quantifying associations between a set of interrelated variables. In climate studies, for example, identifying environmental variables that have the greatest impact on the intensity and frequency of seasonal hurricane activity has thusfar been difficult due to uncertainty and complexity of dynamic, environmental data sets. Conventional data analysis tools and techniques do not support the increasing quantity and number of different parameters in the data for climate studies. As a result, the increased availability of additional environmental data has not lead to a commensurate improvement in the accuracy of seasonal forecasts or to improved preparedness to reduce the impact of natural disasters.
It is believed that predictors of the main dynamic parameters that affect storm activity are observable well in advance, and thus may be used to provide early predictions. The importance of these predictors can be estimated using historical data by statistical regression techniques similar to those described by Vitart, F. 2004. Dynamical seasonal forecasts of tropical storm statistics. In: R. J. Murnane and K.-B. Liu (eds), Hurricanes and typhoons: Past, present, and future. Columbia University Press, New York, N.Y., December 2004. pp. 354-92, the entirety of which is hereby incorporated by reference. Klotzbach et al. used this technique to determine the most important variables for predicting the frequency of North Atlantic tropical cyclone activity, as shown in P. J. Klotzbach and W. M. Gray, 2006, Summary of 2006 Atlantic tropical cyclone activity and verification of author's seasonal and monthly forecasts, Technical report, November 2006. [http://hurricane.atmos.colostate.edu/Forecasts/2006/nov2006/; accessed Apr. 15, 2009]; and P. J. Klotzbach, W. M. Gray, and W. Thorson, 2006, Extended range forecast of Atlantic seasonal hurricane activity and U.S. landfall strike probability for 2007, Technical report, 2006. [http://tropical.atmos.colostate.edu/Forecasts/2006/dec2006/; accessed Apr. 15, 2009], the entireties of which are hereby incorporated by reference. A multiple regression scheme called the Typhoon Intensity Prediction Scheme (TIPS) combining satellite information with other environmental predictors was developed to understand and forecast tropical cyclone intensity for the western North Pacific Ocean, as described in P. J. Fitzpatrick 1996. Understanding and forecasting tropical cyclone intensity change, PhD dissertation, Department of Atmospheric Sciences, Colorado State University, Fort Collins, Colorado, the entirety of which is hereby incorporated by reference.
Regression analysis techniques are often complicated to establish, but provide an ordered list of the most important predictors for the dynamic parameters. Scientists gain additional insight and identify the more informative variables in these studies by evaluating descriptive statistics and performing correlation analysis. In the past, researchers have relied on simple scatter plots and histograms which require several separate plots or layered plots to analyze multiple variables. However, perceptual issues limit the effectiveness of this approach, particularly for large numbers of variables in a given multivariate data set.
One proposed solution is scatter plot matrix (SPLOM) which presents multiple adjacent scatter plots for all the variable comparisons in a single display with a matrix configuration, as described by P. C. Wong, and R. D. Bergeron. 1997. years of multidimensional multivariate visualization, from G. M. Nielson, H. Hagan, and H. Muller (eds), Scientific visualization—Overviews, methodologies, and techniques. Los Alamitos, California: IEEE Computer Society Press. pp. 3-33, incorporated herein by reference. This approach, however, requires a large amount of screen space, and forming multivariate associations is still mentally challenging. Statistical measures have been used to organize the SPLOM and guide the viewer through exploratory analysis of high-dimensional data sets, as described in L. Wilkinson, A. Anand, and R. Grossman. 2006. High-dimensional visual analytics: Interactive exploration guided by pairwise views of point distributions, IEEE Transactions on Visualization and Computer Graphics 12(6): 1366-72, incorporated by reference, and while somewhat better, perceptual problems remain.
Another alternative is to use layered plots, which condenses the information into a single display, but there are significant issues due to layer occlusion and interference, as demonstrated by C. G. Healey, L. Tateosian, J. T. Enns, and M. Remple. 2004. Perceptually-based brush strokes for nonphotorealistic visualization, ACM Transactions on Graphics 23(1): 64-96, incorporated herein by reference. The geographically encoded data used in climate studies are usually displayed in the context of a geographical map. Although certain important patterns (those directly related to geographic position) may be recognized in this context, additional information may be discovered more rapidly using non-geographical information visualization techniques. Thusfar, few multivariate visualization techniques provide access to integrated, automatic statistical analysis techniques that are commonly utilized in climate studies to identify significant associations.
Another multivariate visualization technique known as parallel coordinates is described in A. Inselberg, 1985, The plane with parallel coordinates, The Visual Computer 1(4): 69-91, incorporated by reference, and this technique was applied the technique to the analysis of multivariate relationships in data in E. J. Wegman, 1990. Hyperdimensional data analysis using parallel coordinates, Journal of the American Statistical Association 85(411): 664-75, incorporated by reference. The parallel coordinates approach provides a compact, two-dimensional representation of even large multidimensional data sets. Hauser et al. described a histogram display, dynamic axis reordering, axis inversion, and some details-on-demand capabilities for parallel coordinates in H. Hauser, F. Ledermann, and H. Doleisch. 2002 Angular brushing of extended parallel coordinates, Proceedings of IEEE Symposium on Information Visualization, Boston, Mass., IEEE Computer Society. pp. 127-30, incorporated herein by reference. Siirtola presented a rich set of dynamic interaction techniques (e.g., conjunctive queries) was presented by H. Siirtola, 2000. Direct manipulation of parallel coordinates, Proceedings of the International Conference on Information Visualisation, London, England, IEEE Computer Society. pp. 373-78, and Jankun-Kelly and Waters (2006) and Johansson et al. described new line-shading schemes for parallel coordinates in J. Johansson, P. Ljung, M. Jern, and M. Cooper, 2005, Revealing structure within clustered parallel coordinates displays, IEEE Symposium on Information Visualization, Minneapolis, Minn., October 2005, IEEE Computer Society. pp. 125-32, both of which are incorporated herein by reference.
Focus+context implementations for parallel coordinates have been introduced by Fua et al. (Y-H. Fua, M. O. Ward, and E. A. Rundensteiner. 1999, Hierarchical parallel coordinates for exploration of large datasets, Proceedings of IEEE Visualization, San Francisco, Calif., October 1999, IEEE Computer Society. pp. 43-50.); Artero et al. (A. O. Artero, M. C. F. de Oliveira, and H. Levkowitz, Uncovering clusters in crowded parallel coordinates visualization, IEEE Symposium on Information Visualization, Austin, Tex., October 2004, IEEE Computer Society. pp. 81-8.); Johansson et al. supra; and Novotny and Hauser (M. Novotny and H. Hauser, Outlier-preserving focus+context visualization in parallel coordinates, IEEE Transactions on Visualization and Computer Graphics 12(5): 893-900.), the entireties of which are incorporated herein by reference.
Qu et al. (2007) introduced a method for integrating correlation computations into a parallel coordinates display in H. Qu, W. Chan, A. Xu, K. Chung, K. Lau, and P. Guo., 2007, Visual analysis of the air pollution problem in Hong Kong, IEEE Transactions on Visualization and Computer Graphics 13(6): 1408-15, incorporated by reference. Seo and Shneiderman used a framework to explore and comprehend multidimensional data using a powerful rank-by-feature system that guides the user and supports confirmation of discoveries, in J. Seo and B. Shneiderman, 2005, A rank-by-feature framework for interactive exploration of multidimensional data, Information Visualization 4(2): 96-113, incorporated herein by reference, and Piringer et al. (2008) expanded this rank-by-feature approach by focusing on the comparison of subsets in high-dimensional data sets, described in H. Piringer, W. Berger, and H. Hauser, 2008, Quantifying and comparing features in high dimensional datasets, International Conference on Information Visualization, London, UK, July 2008, IEEE Computer Society. pp. 240-45, incorporated herein by reference. Parallel coordinates tools have also been developed for analyzing social and economic data for comparing different geographical regions.