A data visualization technique is a technique essential for advancing work efficiently in data analysis work. With the development of sensing technology and information management technology, the necessity has increased in recent years where data have become enormous and complicated.
The following scenes are considered as typical scenes of application of the data visualization technique in the area of data analysis: (1) A scene where a data structure is overviewed as a stage prior to analysis, (2) a scene where learning results are interpreted when the data structure is learned as a model by a machine learning technique or the like, and (3) a scene where prediction results are considered when a prediction is made using the learned model.
In the scene (1), there is a method as typical visualization means used to overview the data structure, where after high-dimensional data are compressed into lower dimensions using a multivariate analysis technique such as principal component analysis or multidimensional scaling, the compressed data are displayed on a two-dimensional scatter plots. There is also a visualization technique, called Scatter Plot Matrix (hereinafter referred to as SPM), for arranging, in a matrix form, two-dimensional scatter plots for all combinations of two-dimensional data obtained from the original high-dimensional data. Further, there is a visualization technique, called Parallel Coordinate Plot (hereinafter referred to as PCP or parallel coordinate plot), for arranging axes corresponding to respective dimensions longitudinally in parallel with one another, plotting observed values so that the minimum value will appear at the lower end and the maximum value will appear at the upper end in all the dimensions, and connecting observed values on adjacent axes with line segments.
The principal component analysis and the multidimensional scaling are useful to visualize information indicative of scattering of data points in a multidimensional space while storing the information as much as possible. Further, the SPM and the PCP are useful to visualize a relationship between specific minor dimensions existing in high dimensional space by one diagram as a whole.
Patent Literature (PTL) 1 describes a method of classifying learning data with continuous-valued attributes using a decision tree based on the data distribution feature to make a decision of or change in a model structure easy. In the method described in PTL 1, each node in a generated decision tree is displayed as a scatter plot of an objective function as a data group concerning certain one attribute and an explanatory function as a data group concerning the multiple remaining attributes to make the data distribution feature understandable.