This disclosure relates generally to computer system visualization tools, and more particularly to an apparatus and method for identifying and abstracting the visualization point (which best describes a given visual analytic) from an arbitrary two-dimensional dataset into a unified metadata for further consumption.
Today, algorithms exist that allow a data consumer to take a dataset as input, and then determine the best visual analytic to describe it. A visual analytic can be any visual component for display in a graphical user interface, such as, for example, a line chart. In the case where both the dataset and the visual analytic(s) are defined, one method is to determine a visualization point from the given dataset that best describes the given visual analytics. The term “visualization point” refers to data, including data point, category labels, series names, etc., that will be displayed in the visual analytic. However, this method can only handle a trivial dataset, and requires human interaction to handle the more advanced dataset.
Take Xcelsius as an example. Xcelsius is a data visualization software product that consumes static data such as an Excel data model, and transforms it into visual analytics (i.e. interactive visual interfaces) to provide improved business insight, analytical reasoning, and management. Because it is often the case that an Excel data model in an organization is created long before the organization adopts a product like Xcelsius, it is costly and difficult to re-do or adjust all data models for the Xcelsius consumption. In other words, it becomes a requirement for Xcelsius to offer a smarter way to take any existing data model (or dataset) for consumption of the visual analytic, in addition to the trivial dataset.
Another example is the “Whohar Community” project in SAP's Business Objects On-Demand offering. The goal of the Whohar Community is to provide a marketplace between the data provider (who contributes the mass volume of data to a Whohar server repository) and the data consumer (who consumes the data to produce the visualization). In the nature of the Internet, the data provider and the data consumer are often disconnected. Therefore, the data consumer often needs to make a best guess about the data (unless the data schema is present during the time of consumption), or the data provider is left to format its data in only the most obvious way.
Data visualization can help to quickly provide business insight on a mass volume of data. There are two ends in the data visualization: Data Provider and Data Consumer (e.g. Chart Engine, or Visual Analytic engine in this case). The Data Provider is the system (or process) to provide the dataset (e.g. database, spreadsheet, etc), while the Chart Engine is the system (or process) to take the dataset as input and create the visual representation (e.g. bar chart). In the common approach, the Data Provider produces the dataset and formats it in the most relevant way so that the Chart Engine can most efficiently identify the visualization point for a given visual analytic.
There are at least two problems in this common approach: 1) complexity—the Data Provider can only format the dataset in the most trivial way, so both the Data Provider and the Data Consumer cannot deal with a complex scenario, at least not without human interaction; and 2) flexibility—the Data Provider and the Data Consumer (e.g. Chart Engine in this case) are very much tied together, such that the Data Provider often provides the dataset to only one particular Data Consumer for consumption. Compatibility issues exist if other Data Consumers want to consume the same piece of data.
For example, Business Objects' Xcelsius can create a visual analytic for a given dataset stored in Microsoft Excel (i.e. data provider) only if the dataset is formatted so that the series names and category labels are assumed to appear on the top or left region for the line chart. FIGS. 1A and 1B illustrate two scenarios that contain two datasets for the same visual analytic (e.g. line chart). FIG. 1A is an example of a trivial case, which is supported by existing methods, while FIG. 1B is an example of a complex case, which is presently unsupported, where no existing method is available and human interaction is required.
For the unsupported cases, the visual engine in currently available visualization tools requires human interaction to explicitly specify a mapping between the dataset and the visual analytic. When the user selects the dataset for the given analytic, the data in the dataset will then be extracted to match the internal data structure of the visual analytic.