1. Technical Field
The present disclosure relates generally to histogram generation, and more particularly to, histogram generation on multi-dimensional datasets.
2. Related Art
Data may be stored in columnar format in various types of data store systems, such as databases and file systems, for example. In many instances, each column represents a single attribute of interest. However, columns may be related such that they form a multi-dimensional dataset with each column representing a dimension. Such multi-dimensional aspects may be used to represent geospatial coordinates or other multi-dimensional-based information.
Various data store systems generate statistics on data, such as histograms, to be used in query response planning. These statistics may be taken on multi-dimensional datasets. However, this type of statistics gathering may require large amounts of system resources, both processing resources and storage resources. Histograms generated on multi-dimensional datasets may also be equal-sized, which may result in degraded results (cardinality estimation accuracy/histogram precision) and lead to less optimal query plans.