This invention relates to systems and methods for graphically representing multiple data items where the magnitude of one or more data items is substantially larger than the magnitude of other data items.
Humans generally tend to comprehend and assimilate information more quickly when that information is graphically presented. This is particularly true with regard to numerical values. When a series of numerical values must be considered, it is often advantageous to present those values in graphical form instead of as raw numbers or as a table of numbers. A common way of graphically depicting several data items is a bar graph. Each data item is represented as a separate bar, with the length of the bars sized in the same relative proportion as the corresponding data items. In order to maximize the space available for displaying the graph, it is known to scale the graph size based on the data item having the largest magnitude. This is best illustrated by example. Assume a person wished to prepare a chart showing the following monthly expenses for a particular year: January, $125; February, $38; March $75; and April, $52. Further assume that the graph must be placed in a space where the largest bar can be 1.25 inches long. Dividing the largest value ($125) by the available space for that bar (1.25 inches) gives a vertical scale for the graph of $100 per inch (or $25 per xc2xc inch). The largest bar (January, $125) is 1.25 inches high, and the smallest bar (February, $38) is 0.38 (xe2x85x9c) inches high. FIG. 1, drawn approximately to scale, illustrates this.
There are limitations to the usefulness of bar charts, however. If one of the data items in a data set has a magnitude that is out of proportion relative to the other data items in the set (i.e., the data item is an xe2x80x9coutlierxe2x80x9d), displaying the data set in a scaled graph or chart becomes awkward. For example, if January expenses from the prior example were $1,250, the largest bar would still occupy 1.25 inches, but the remaining bars would only be 0.038, 0.075 and 0.052 inches high. FIG. 2, also drawn approximately to scale, illustrates the impact of the increase in January expenses. The smaller bars become almost imperceptible, and the chart is thus less useful. Such a chart could also tend to obscure any trends in the data items, particularly if the January data item is aberrational. In other words, one abnormal data point could make the chart virtually useless with respect to the other data points. These problems can be more acute in a computer context. Many display devices have a size and/or resolution that is significantly less precise than might by found in a newspaper or other printed media. Because the display is typically limited to a discrete number of pixels, very small differences between data item sizes may be even less perceptible.
Previous methods of displaying data sets with outlying data items have included use of logarithmic or other non-linear scales. However, such scales can be misleading if the observer is not aware of the logarithmic scaling. Because logarithmic scales are not as commonly used in certain non-technical areas as they may be in more scientific disciplines, a logarithmic scale could easily be overlooked. Even if the observer is aware of the logarithmic scale, however, such scales may not be as intuitive as a linear scale, and thus require more study to fully comprehend.
The present invention allows visual representation of a set of data items containing outlying data such that images representing non-outlying data items are not reduced to the point of obscurity. Graphical representations of outlying data items are truncated or otherwise modified, and the remaining non-outlying data may be represented in a linear, easy to read fashion. In one embodiment of the invention, a threshold value of all the data items is determined, and each data item is compared against that threshold to determine if it is an outlier. The threshold may be a mean or a median of the data items, a multiple of the mean or median of the data items, or any other appropriate value. Data items having a magnitude exceeding the threshold are represented as images having a xe2x80x9cbreakxe2x80x9d or other indication that the image is not scaled relative to other data items. The remaining data items may then be displayed in a graph that is scaled based on the largest data item magnitude that does not exceed the threshold. In this manner, a more effective and usable graphic presentation of the data is possible.
The invention may be implemented with regard to horizontally and vertically oriented graphs, with regard to 2-D and 3-D graphs, and with regard to other types of graphical data representations. Multiple outliers can be represented in a single graph as identically-sized images, or as scaled images. In one embodiment of the invention, an outlier is represented as an image having a maximum image size, with the largest non-outlier represented as an image having a size equal to a percentage of the maximum. The invention can be implemented in a general purpose digital computer or in any other device which can be configured to generate a graphical display of data items.