A database is a collection of stored data that is logically related and that is accessible by one or more users or applications. A popular type of database is the relational database management system (RDBMS), which includes relational tables, also referred to as relations, made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information. Within large corporations or organizations, a database system known as an enterprises data warehouse, may contain close to a petabyte of critical data, organized into hundreds of tables, used by many thousands of persons, performing tasks across all business or organization functions. To perform essential functions, it must operate efficiently and reliably every second of every day.
In-memory processing capabilities have recently been implemented within database systems, where data is stored and processed in CPU memory, offering much faster processing times than systems and applications limited to processing data in non-volatile or persistent storages, e.g., Hard Disk Drives (HDDs), Solid State Disk Drives (SSDs), and Flash memory.
Within relational database systems, a join operation is executed to combine records from two or more tables. A hash join is one form of join well suited to in-memory processing. In one form of hash join, the smaller table being joined is built as a hash table in CPU memory, and qualified rows from a second, or large table, are written to a spool file—a temporary file typically used to hold intermediate result data. Potential matching rows from the large table spool file are searched against the hash table.
Described below is an improved spool table structure compatible with in-memory processing for increasing cache efficiency during hash join processing.