1. Field of the Invention
This invention relates to analysis of multi-dimensional data bases. Specifically, the invention uses interactive graphic displays to explore relationships among variables in a multi-dimensional database.
2. Description of the Prior Art
Data in its raw form, i.e., as a list or table of numbers can be uninteresting and difficult to interpret. To make data more understandable, and in particular, to show relationships between data, many alternative methods of presenting data are used. These methods typically include graphs, charts, and other presentation methods common in the art.
As the amount of data to be presented becomes larger, it often becomes increasingly difficult to present the data in a meaningful way. These difficulties are compounded if the data is many-dimensional i.e., has a large number of variables.
A variable can be thought of as vector or observed points. A set of variables can be thought of as a table of numbers or other tokens (a blank delimited sequence of charactersxe2x80x94a number and/or a word) where each column is a vector variable. Each row or record of the table is a set of related observations.
Data with a low number of variables can be visually presented and analyzed easily. As an example, data from a table representing two variables, e.g., length and width, can be easily graphed on a two dimensional plot just as data with three variables, e.g., length, width, and height, can be easily graphed on a three dimensional plot. The prior art can even present data with some limited number of additional variables/dimensions, e.g., temperature and motion, by adding color and/or animation to a three dimensional presentation.
The prior art has attempted to display presentations of large amounts of data with a large number or variables. However, many of these presentations showing larger numbers of data variables, if they are possible to compose at all, become difficult or impossible to interpret,
Cleveland and McGill in Dynamic Graphics for Statistics use an array of scatter plots to show the relationship among N variables in a data set. (A scatter plot is a graph of the values of one variable plotted against the values of another.) The elements of the array are scatter plots which show the relation between two of the variables. The two variables in each scatter plot are determined by the location of the plot in the array. The array has a scatter plot for every permutation of two variables that is represented in the array.
Cleveland and McGill use one or more colors to select certain points, called subsets. From the total number of points in the data base. Using a technique called brushing, certain points are xe2x80x9cpaintedxe2x80x9d a specific color because these points satisfy a certain condition. In addition, every point in the array representing any of the painted points is also painted the same color. By using a single color, the prior art specifics a subset of observations which may show relationships among variables. The prior art also uses multiple colors on one presentation to show multiple subsets of observations which may show additional relationships among variables. Cleveland and McGill further show subsidiary displays which are coupled only in one direction, i.e., from a first display to a second, by not vice versa.
3. Problems with the Prior Art
Even with all its attempts to present data in a meaningful way, the prior art has a number of failings in dealing with data with a large number of variables.
The prior art is not versatile enough to allow efficient exploration of selected subsets of data. Brushing may show that there is some relationship among a certain selected subset of points, but further analysis than this is not shown in the prior art. The prior art does not disclose efficient methods for finding out what different relationships the subset of points has with other data or what relationship exists between variables. The prior art data presentations do not allow the user to easily generate many alternative presentations by selecting a variety of presentation attributes from among a variety of presentations. The prior art also does not allow a user to query the data from the many different perspectives that can be shown in alternate presentations.
The prior art does not offer a rich variety of presentations or presentation types which are mutually coupled together and which are mutually accessible from one another. Without this multi-directional coupling of many diverse presentations and presentation types, it is difficult to organize the data and to identify relationships among variables.
The prior art also does not permit multiple independent uses of color on coupled presentations to visually show combinations of variable conditions.
It is an objective of this invention to provide an improved method and apparatus for presenting multi-dimensional data and exploring relationships among the many variables of the data.
It is an objective of this invention to provide an improved method and apparatus for presenting and exploring data by using the array of visual presentations of the data variables as a directory to access a plurality of subsidiary presentations of data used to present, organize, select, and condition data relationships.
It is another objective of this invention to provide an improved method and apparatus for presenting and exploring data by using multiple independent, and coupled data presentations.
It is another objective of this invention to provide an improved method and apparatus for using color to illustrate the effect of logical operations and transformation performed on the variables of a database.