Field
Embodiments relate to query optimization using memory caching techniques that include hash based statistics.
Background
Commercial database systems rely on caching techniques to improve performance. Caching techniques are often implemented in a memory cache that can be accessed quickly, such as random access memory (RAM), as opposed to storage that takes longer to access, such as disk-based storage. Cache memory typically stores frequently used data and reduces the time needed by a database system to access a data page in the disk-based storage. A cache memory, however, is much more costly than disk-based storage and is limited on a computing device.
One way to select an optimal query plan that retrieves and manipulates data in a database is to use data statistics. One way to gather data statistics is using a histogram. For example, a histogram of data in columns that affect query plan generation may be generated. Conventionally, to generate a histogram, data in columns is sorted to obtain cell boundaries having respective weights. This is conventionally performed as an iterative process by sorting runs in cache memory and spilling sorted runs to disk which are subsequently read during a merge phase and create new sorted runs. Because sorting column's data is a disk and memory intensive procedure and can adversely affect database usage, sorting is typically performed during off-hours. However, due to globalization and a general 24-hour/7-day need for database resources, off-hour windows are small or non-existent. This creates a need for generating a histogram that does not depend on expensive database procedures, such as a sort.
In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.