Relational databases are used by many enterprises in many database management systems. Relational database management systems are often used with data warehouses where vast amounts of data can be stored and processed. Recently, data mining applications have been developed for identifying and interpreting patterns in databases.
Data mining applications can use database query optimizers. More specifically, histogram statistics can be used to more accurately estimate a number of rows and unique entry counts (UECs), where a UEC is the number of unique values represented within any particular interval of a histogram.
Histograms are also used for developing statistics to describe the distribution of data in the database tables. For example, gathering accurate statistics about the data in the tables can be useful in estimating predicate selectivity for forming optimal SQL queries. A histogram can be used to group data attribute values from the table(s) into subsets and to approximate true attribute values as well as frequency distributions. Because histograms are generally summaries of much larger distributions, estimations based on histograms may still include errors. However, for most real-world databases and applications, histograms can be produced with acceptably low error estimates while occupying a reasonably small storage space.