In relational database systems, implementation of the join operator needs to process two tables (e.g., a left-side table and a right-side table). The tables are related in some way, usually based on an association of objects in the left-side table with objects that share a common dimension in the right-side table. In some cases one or both of the tables can be very large, and the join operation can be parallelized. For example, in a system having multiple computational elements, a portion of the tables can be distributed to each one of the multiple computational units, then each unit can perform the portion of the join corresponding to the portion distributed to the respective computational unit.
Depending on the relative sizes of the tables involved in the join, the portions of the constituent tables can be apportioned to the computational units according to an execution plan. For example, given a scenario to execute a parallel join with 10 computational units, and given a join of left side table T1·×1 (say of 1 thousand rows) to a right side table T2·×1 (say of 1 million rows), one possible plan is to distribute table T1 to all computational units, and then apportion successive tenths of table T2 to each of the 10 computational units. Each unit can perform its respective portion of the overall join, and the results from the individual computational units can be combined to form the overall results of the join.
When the left input of a join is relatively small compared to the right side of the join, the optimizer might determine to broadcast all of the left input to all the computational units performing the join. In some cases this broadcast plan can prove to be a very good plan because there is no distribution of the right side of the join since the same computational units performing the join will also produce the right side of the join. Also, the broadcast plans handle the cases with skewed join keys and left inputs with small sizes naturally, leading to better utilization of computational units compared to those obtained with its hash-hash distribution counterpart in these cases. However, this sort of plan can easily become a scalability bottleneck if the small table needs to be broadcast to a very large number of computational units (e.g., in the case when the right side of the join is very large). Broadcasting the small table can consume substantial resources when the right side of the join is large, since the acts of broadcasting would need to be performed for a very large number of computational units. Moreover the acts of broadcasting to such a large number of computational units incurs a potentially large penalty in the form of interconnect protocol overhead.
Although the aforementioned broadcast distribution is an applicable distribution method in the above mentioned cases, the motivation of this disclosure is to introduce small table replication to improve over the performance of broadcast distribution, especially in the presence of small tables, while under the demands of a very large degree of parallelism (DOP). The herein-below disclosure handles small tables by replicating them using a memory component such as a buffer cache.
As another example, in a system having a single storage unit and multiple computational elements interconnected by a shared common bus, some of the bandwidth of the bus would be used by access protocol to (1) gain access to the bus, (2) send the request for data to the storage unit, (3) receive packets of the requested data, (4) acknowledge receipt of the packets of the requested data, (5) relinquish access to the bus, and (6) perform other protocol-related operations.
In legacy systems, a given computational unit might sequence data access as follows:                communicate with the storage unit to obtain all or a portion of the left-side table,        communicate with the storage unit to obtain all or a portion of the right-side table or relation,        perform the join operation on the obtained portions.        
One can observe that for a small table, the cost (e.g., bandwidth, latency) of a unit of overhead to communicate with the storage unit to obtain all or a portion of a table can be even costlier than the cost of moving the table data from the storage unit to the computational unit. Moreover, in practical situations, it frequently happens that a small table is involved in a join operation (e.g., in a left-side table); thus the aggregate cost of the overhead is proportional to the frequency of occurrence. What is needed is a system for reducing overhead in a parallel join distribution plan. Moreover, none of the aforementioned technologies perform the herein-disclosed techniques for replicating a smaller left-side table for performing a join operation with a portion of a larger right-side table or relation in order to reduce data communication protocol overhead. Therefore, there is a need for an improved approach.