A database is a collection of stored data that is logically related and that is accessible by one or more users or applications. A popular type of database is the relational database management system (RDBMS), which includes relational tables, also referred to as relations, made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information. Within large corporations or organizations, a database system known as an enterprises data warehouse, may contain close to a petabyte of critical data, organized into hundreds of tables, used by many thousands of persons, performing tasks across all business or organization functions. To perform essential functions, it must operate efficiently and reliably every second of every day.
In-memory processing capabilities have recently been implemented within database systems, where data is stored and processed in CPU memory, offering much faster processing times than systems and applications limited to processing data in non-volatile or persistent storages, e.g., Hard Disk Drives (HDDs), Solid State Disk Drives (SSDs), and Flash memory.
Within relational database systems, a join operation is executed to combine records from two or more tables. A hash join is one form of join well suited to in-memory processing. In a hash join, one or both tables to be joined are fit completely inside CPU memory, with the smaller table being built as a hash table in memory, and potential matching rows from the second table are searched against the hash table.
Described below is an improved system and method for performing in-memory hash join processing providing more efficient utilization of memory bandwidth and CPU throughput.