1. Field of the Invention
The present invention relates to the field of visual and graphical displays for the representation of data. Specifically, the present invention relates to a method for visualizing high density graphical data for display.
2. Related Art
Modern computer systems and other media for accessing and interacting with information typically display data visually in various graphical formats. Visually displayed graphically formatted data is an important medium in many fields. These fields include electronically based commerce and business, also known as e-commerce and e-business, operations research, epidemiology, information technology (IT) network administration and engineering, and a wide-ranging host of others.
The are numerous examples of applications of visually displayed graphically formatted data in these fields. E-business relies heavily on market basket analysis and customer profiling. IT depends on management of its various resources. Further, e-commerce often tasks IT extensively, including management of its resources. Epidemiologists look for correlations between different diseases, and/or between diseases and a host of environmental factors. Such practical fields rely on real world data, e.g., data reflecting information that is based in reality. The field of e-commerce provides the following example of application of crucial real world data in market basket analysis.
Market basket analysis has become a key success factor in e-commerce. Effective market basket analysis methods employ association and clustering as methods of analyzing such data. E-commerce transactions often are comprised of several products (e.g., items) that are purchased together. An example of such an association is the fact that 85% of the people who buy a printer also buy paper. Understanding these relationships across hundreds, even thousands of product lines and among millions of transactions provides visibility and predictability into product affinity purchasing behavior.
Contemporarily, some technologies allow the visualization of associations for commercial entities such as retail stores and others, to make business decisions, such as product recommendations, cross selling, store shelf arrangements, and a host of others. As illustrated in Prior Art FIG. 1A, one conventional technique for visualizing associations is a matrix display. Matrix display technique positions pairs of items on separate axes to visualize the strength of their relationships. One such association visualizer lays out the rules on a 3D grid landscape. Visual filtering and querying allow users to focus in on selected rules. However, to visualize many, sometimes millions of association rules, association matrixes are too restrictive. The number of rules shown at the same time needs to be pre-decided. Further, the number of rules is limited to a small range, e.g., on the order of 10–20.
An alternative conventional technique involves laying out associations on a graph, as depicted in Prior Art FIG. 1B. Such associations between these data are known in the art as edges. One such contemporary technology uses an individual purchase history to make suggestions to shoppers based on a graph. However, when the number of items grows large, the graph can quickly become cluttered with many interactions, e.g., too many edges appear to be useful to most analysts attempting to glean relevant information from such a graph. Also, associated items may not be placed close together, such that their edges graphically become tenuous. The market analysis graph of one contemporary technology has achieved some improvement by utilizing dynamic queries and presentations.
Besides their application in data associations, graph visualization methods have become useful in information visualization. One such technique uses cone trees and their hyperbolic projections. These are utilized for web and file system visualization. Another such technique uses fast graph layout to display various types of statistical data. A central approach to such graph visualization techniques exploits physics-based paradigms. Recent conventional techniques apply clustering algorithms to improve performance and scalability of such physics-based methods.
In spite of the advances in the field, it is still difficult to mine and visualize customer's purchasing behavior from millions of Internet transactions. As the volume of e-commerce data grows and as the transaction data is integrated into off-line data, new data visualization associations are required to extract useful and relevant information.
With such real world information, seldom is the data distribution uniform. In the real world, the distribution of data may often be stochastic, variable, and/or dynamic. The graphical density of such data is accordingly also not uniform. The graphical display of such data has consequences arising from its often non-uniform nature. These consequences are problematic for visualization, e.g., for conveying graphical meaning from such data to a user viewing it.
One such problematic consequence is that graphical clutter, in which the data are densely concentrated visually in a confusingly tight, seemingly disordered array. Another such problematic consequence is graphical sparsity, in which the data are exiguously dispersed visually in a loose, seemingly unconnected array.
Often, graphical clutter results in overplotting, e.g., certain data items are not visible and the overall structure gets lost. Clutter results in extraordinarily dense clusters of graphical data. Such clusters are often too close to each other, or even penetrate each others boundaries. These conditions render a user's visual navigation among these data extraordinarily difficult, undermining or even precluding significant mining of information via visualization.
Graphical sparsity, on the other hand, results in inefficient use of the available display space. Significant data may graphically be so far separated on a display screen that relationships between them are difficult for a user to spot, and concomitantly easy to miss. This renders mining errors and information losses probable. Further, data may graphically be so far separated from each other that a given display screen may not present them simultaneously, exacerbating the probability of mining errors and data loss.
Thus, for users of graphically displayed data, both graphical clutter and graphical sparsity may be confusing and/or make it difficult for a user to glean meaning from graphical information presentations so cluttered and/or sparse. However, a further related problematic consequence appears from both. This is that often, both graphical clutter and graphical sparsity appear together in different portions of the same visual display of data. The individual drawbacks of both graphical clutter and graphical sparsity are thus compounded, exacerbating the difficulty users such as analysts have in extracting useful information from displayed data.
Conventional approaches to alleviating these difficulties include partial capitulation. This is achieved in one technique by reducing the amount of information displayed. In another technique, data is distilled by the use of filters. However, these approaches are problematic, in and of themselves. Capitulatory techniques such as these are problematic because they inherently decrease the amount of information available for analysis, e.g., they are content limiting.
Another conventional approach to alleviating graphical cluttering and/or graphical sparsity include constraining the display of information by segregating data according to different levels of analytical interest. However, this approach is also fraught with the problem of content limitation.
Yet another such conventional approach applies multiple visual representations, each with differing amounts of detail, or varying the amounts of detail to effectuate subsequent visual representations. A further conventional approach attempts to ameliorate the problematic cluttering effects by reducing a high-density graph. In this approach, the high density graph is reduced by supporting multiple graphical representations of data items.
One further conventional technique is the use of a zoom function for the graphical display. However, as the object gets larger, it occupies a larger percentage of the display and does not reduce the complexity of the graph. Further, the zoom-enhanced portion either covers non-enhanced portions of the graphical representation or pushes non-enhanced portions of the graphical representation out of the displayable viewing area of the display monitor. Either action results in rendering the non-enhanced portions of the graphical representation unavailable for simultaneous viewing with the zoom-enhanced portion.
In as much as the immediately foregoing conventional approaches may yield graphical data displays with less detail displayed of the data therein, or hidden and/or displaced graphical data, these conventional approaches also limit content, and are thus problematic from this perspective. However, these conventional techniques are further problematic for two reasons.
First, a user attempting to mine data using these conventional techniques must separate the representations, thus having to analyze them separately. This is confusing, time consuming, and labor intensive for analysts applying the techniques. Second, in forcing the user analyst to separate his informational representations based on their individually varying degree of detail, contextual relationships between the data may be lost.
Thus, the user suffers not simply loss of informational content available in his graphical display of data, but further suffers a lose of context, which in many modern analytical applications may cause lose of crucial information. Individually, combining the consequences of loss of content and loss of context is serious enough. However, combining the effects of each loss may amplify the significance of both losses in a manner worse than simple arithmetic addition, such that the total loss of analytical value is much greater in certain situations.
To visually represent non-uniform data with a wide range of values to analysts who must make use of them, optimization of scaling is desirable. Yet another conventional approach, which attempts to so optimize data scaling, is the application of a pseudo coloring scheme to the display of edges between data. However, such pseudo coloring schemes fail as, for instance, all the similarity edges are colored identically or nearly so, due to the non-uniformed dataset.