Many factors may be taken into account when designing and creating a useful system for generating graphical visualizations of data retrieved from one or more sources. One such factor is to ensure that the representation of data in the visualizations is done so in a manner that enables a user to easily discern the relevant information from the data, or to interpret the data in such a way that is useful to the user. This becomes particularly relevant when analyzing extremely large sets of data, and more so when the large sets of data are skewed.
One particular example of a graphical visualization method that is used to help users understand the relevance of the data values within data sets is the heatmap. A heatmap is generally a two dimensional display which may allocate a color based on the value of a plurality of variable dimensional data values that have been retrieved from one or more data sources. For a particular data point, a first and second dimensional data value is used to place the data point at an x, y position on the map. A third dimensional data value may be indicated by displaying the data point in one of various colors available to the graphics display system. The color selected for the particular data point is based on the dimensional data value associated with the third dimension of the data for that data point.
Heatmaps may use other means to identify the dimensional data values other than color. For example, grades of shading, symbols, shapes etc may all be used in a heatmap to show the relative placement of data values with respect to each other.
By using heatmaps as opposed to standard chart methods, it becomes possible to plot on a two-dimensional map, three dimensions of information associated with the data. It then becomes easier for a user to get an understanding of how the different dimensions of the data vary in relation to each other by viewing the heatmap. Further, if data is plotted using three dimensional co-ordinates x, y and z, a fourth dimension may be added using color, for example, to produce a heatmap.
For example, data values (such as demographic data values) associated with a population of a particular area covered by co-ordinates x, y may be plotted to easily determine how the population varies over an area. That is, the x, y position on the map indicates the actual geographical x, y co-ordinates of where individuals are living. A third dimension may be added to indicate the number of the individuals living at the various x, y co-ordinates, where different colors indicate group sizes per area. For example, at co-ordinate x1, y1 there are 100 people living in a radius of 200 meters, at point x2, y1 there are 76 people living in a radius of 200 meters, etc. The different values 100, 76 etc may be plotted using different colors.
In a further example, tolerance data values associated with a manufacturing process may be plotted against a factory layout to determine where errors are occurring in the process. For example, at manufacturing position 1 indicated by a rectangle between points (x1, y1), (x1, y2), (x2, y1) and (x2, y2) a tolerance value of 10% is measured, where the 10% indicates that the product being manufactured has been measured as being 10% over or under a pre-stored desired value. Other manufacturing areas as defined by x, y co-ordinates may be monitored to detect tolerance values that range from 1% up to 100%, or greater where the tolerance value is more than twice the pre-stored desired value. The manufacturing areas may be plotted on an x, y plot where various different colors, for example, are used to indicate the measured tolerance values.
It will be understood that the x, y values in a heatmap are not required to be associated with geographical or physical locations and may be associated with any other variable that may be measured or determined.
However, an important factor that should be borne in mind when developing a graphical visualization system is that the heatmap should have a limited number of colors or shadings to represent the different value ranges of the data in order to represent the information in a clear manner. Therefore, thought is required on how to represent the different value levels of the data in the heatmap over the whole range of the data in a manner that clearly and accurately conveys the required information associated with the data, without swamping the user with too many colors or shadings and without losing or masking relevant data values.
Data values for a number of data points in a set of data can be distributed over a large range, producing a skewed data set. For example, data values may exist within the data set that have values in single units (or percentages or portions thereof) up to values in units of a thousand, million or higher. When generating a heatmap to visualize these data values, a number of different colors or shades of a particular color (or a mixture thereof) are allocated for each of a number of preset ranges of the data values being represented. A different shade or color may be allocated to represent the data values over a specific range of data values. For example, a first color such as blue may be allocated to represent data values in the range from 0 to 1000, a second color, green, may be allocated to represent data values in the range from 1001 to 2000, a third color, yellow, may be allocated to represent data values in the range from 2001 to 3000, etc.
When developing a heatmap generating system, it is usual to make a rough estimate of the maximum data value that is likely to be retrieved by the system and visualized, determine the number of suitable color or shade transitions that should be made available to construct an easily discernible heatmap of the data values, and then to uniformly allocate the number of available transitions to the data values through a simple division process.
For example, if the user guesses that the maximum data value is unlikely to exceed 10,000 and the number of available or chosen color transitions for the heatmap is chosen to be 10, then each color is uniformly allocated to a range over 10,000/10=1000 data values. The colors may therefore be assigned in a linear manner as shown in the following example:
Black=1 to 1000
Dark Blue=1001 to 2000
Light Blue=2001 to 3000
Turquoise=3001 to 4000
Dark Green=4001 to 5000
Light Green=5001 to 6000
Dark Yellow=6001 to 7000
Light Yellow=7001 to 8000
Dark Red=8001 to 9000
Light Red=9001 to 10000
Therefore; according to an example, a data point representing a data value of 8560 will be represented by the color Dark Red, and a data point representing a data value of 2340 will be represented by the color Light Blue.
Using this type of method may be suitable for straight forward linear data sets with predictable and limited maximum data values, however in real world data sets taken from real life measurement scenarios such as in manufacturing processes, quality control measurements, scientific experimental data, business records etc, the data values can be far from linearly organized and can result in very skewed data sets with the majority of data values in the data set being located over a very small range.
Further, some potentially more important data values in the data set may be positioned in a different range located statistically distant from the majority of the data set which may result in these important data values not being clearly recorded on the heatmap. Therefore, it is required to allocate a large number of different transitions to cover all the available data values in the data sets that are being mapped to the heatmap. This can result in a heatmap that is unable to graphically show the relevance of the more statistically distant data values as the data points representing these distant data values are swamped by the data points representing the data values of the more common data points. This may result in the user not being able to detect the more distant data points in the map.
Further, unnecessary calculations may be made to determine a large number of transition points which may not have data points located there between. This therefore wastes computing power, time and energy.
In addition, it may be required to first estimate the likely maximum data value in the data set. Inaccurate estimations can occur, thus resulting in the larger more important data values being overlooked and so data points associated with the larger values may not be represented in the heatmap.
An object of the present invention is to provide a system or method that generates breakpoint values for heatmap data to create a heatmap that displays information in an informative manner over the full range of the data, or to at least provide the public with a useful choice.
The present invention aims to overcome, or at least alleviate, some or all of the afore-mentioned problems.