Present invention embodiments relate to database systems, and more specifically, to using multilevel join-filters to identify non-matching rows of database tables for join or other operations in a parallel processing system.
In large database systems, data is commonly distributed across several nodes in a shared-nothing manner. Some database operations may require data to be exchanged between nodes. For example, in a join operation, nodes having local outer-table records (outer-nodes) may send those records to nodes having local inner-table records (inner-nodes). The inner-nodes join the received outer-table records with matching local records of the inner-table.
One technique to reduce network traffic is for each inner-node to send a Bloom filter to the outer-nodes. The Bloom filter is typically a bitmap of hash values of the join keys on the inner-node. The outer-nodes use the received bitmaps to filter out local outer-table records that will not have matches before sending those records over the network.
However, collisions may result in an outer-node sending records that will not have matches on an inner-node. The larger the inner table, the larger the Bloom filters must be to avoid too many collisions. If the inner table is too large, the cost of shipping the Bloom filters over the network and storing them (one from each inner-node) on each outer-node may become prohibitive. Conventional database systems forego the use of Bloom filters where the network or memory overhead is too high.