1. Field
The following description relates to a database and a query processing technique thereof.
2. Description of the Related Art
Multidimensional data is data that includes two or more attributes. For example, in the field of medicine patient information for use in making a medical decision may include age information, various lab test results, prior medical history, and the like, and may be classified as multidimensional data. Multidimensional data is currently being used in various fields including the fields of medicine and also various other fields such as finance.
Multidimensional histograms are summaries of multidimensional data. Multidimensional data is generally large in size and as a result is commonly stored in disks. On the other hand, a multidimensional histogram is a summary of multidimensional data. Typically a multidimensional histogram is relatively small in size and as a result is commonly stored in memories that are more accessible than hard disks. Because of this, when processing a range query for multidimensional data or estimating the range query, it may be more efficient to generate a multidimensional histogram based on the multidimensional data and manage the multidimensional data on a memory level using the multidimensional histogram instead of using the entire multidimensional data.
A multidimensional histogram typically includes a plurality of buckets. For example, a multidimensional histogram may include several hundreds of buckets. Each bucket includes a data space S and data quantity information F that indicates an amount of data in the data space S. It may be assumed that F data in the data space S is uniformly distributed. However, data may not necessarily be uniformly distributed in each bucket.
A distribution of data in a particular data space may be arbitrary. Therefore, estimation of the selectivity of a range query may be affected by how each bucket is determined. An estimate of the selectivity of a range query is proportional to the overlapping area of the range query and each bucket. Accordingly, it is helpful to distribute data in each bucket as uniformly as possible in order to improve the precision of estimation of the selectivity of a range query.