A query may include one or more predicates that provide conditions for returning data items from a table to the requesting user or application. Traditionally, evaluating these predicates involves reading each row from the table and evaluating each condition row-by-row. This method was employed because on-disk row-major tables have been the dominant storage mechanism in relational databases for decades. Over the last decade, however, there has been an explosive growth in demand for faster analytics. After recognizing the need for faster analytics, the industry responded with the adoption of column-oriented databases.
U.S. patent application Ser. No. 14/337,170, filed Jul. 21, 2014, entitled “Mirroring, In Memory, Data From Disk To Improve Query Performance”, (referred to hereafter as the “Mirroring Application”) is incorporated herein in its entirety. The Mirroring Application describes a dual-format database that allows existing row-major on-disk tables to have complementary in-memory columnar representations. On-disk tables are organized into row-maj or “blocks”, while the in-memory copies of the tables, or portions thereof, are organized into “in-memory compression units” (IMCUs).
Unfortunately, the current techniques of performing predicate evaluation during an in-memory scan of columnar data require row stitching before predicate evaluation can be performed in row major representations in the execution engine. The in-memory scan involves scanning blocks of columnar units, decompressing the columnar units, and then stitching the columnar units back into rows, so the predicates may be evaluated against row-major data. The process of scanning, stitching, and then evaluating incurs unnecessary overhead for rows that do not satisfy the predicate.
The unnecessary overhead for evaluating predicates against columnar data may be exacerbated when evaluating a join predicate. A join predicate joins the rows of two or more tables on a particular column called the “join-key”. To evaluate the join predicate, the rows from each table in the join need to be decompressed and stitched together. Then, the row from one table is compared to the row of another table to evaluate whether the rows match the join condition. If the rows from each table do not match the join condition, the combination will not appear in the join result. For join evaluation, a query execution engine potentially needs to evaluate the join predicate on all combinations of the rows from both tables, inducing huge overheads. Additionally, in a database cluster, rows residing in one node may be broadcast to different target nodes that are each assigned to perform the join operation on a discrete set of data. The additional cost of broadcasting data from one server to another server incurs additional overhead in computing resources when some of the rows will ultimately not appear in the join result.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.