Present invention embodiments relate to reducing data processing overhead in database applications, and more specifically, to reducing the overhead of data transfer by utilizing information contained in separately-maintained data distribution statistics.
Recent developments in information technology allow organizations to collect, store, integrate and search unprecedented amounts of data. The persistent data storage capacity of modern information systems, such as those that include data warehouses, can be easily increased by adding one or more relatively inexpensive storage/processing nodes. However, working memory, i.e., that memory that can be directly read from and written to by a data processing unit, e.g., a microprocessor, is typically fixed or otherwise limited to the address space of the data processing unit. Consequently, certain data structures stored in persistent data storage, e.g., database tables, that exceed the working memory capacity must typically be processed in “chunks” that are sized for computational efficiency. Certain database management systems (DBMSs) allow a user to select the size of such chunks for a given computing environment, which is largely defined by the working memory capacity.
Moving data from persistent data storage into working memory for purposes of data processing constitutes overhead in any data processing operation, but such is particularly problematic where massive amounts of data are concerned. Minimizing this overhead is thus an ongoing research and product development concern.