This disclosure relates generally to parallel computing environments and distinct operations performed on multiple tables of shared memory of parallel computing environments.
Computer processor design has become increasingly influenced by certain physical limits, like heat production, signal propagation delay, transistor size, and bandwidth of communication channels. Roughly since 2006, processor frequency (a measure for the computation power of a processor) has not significantly increased. Therefore, as an alternative to increase computation power, chip vendors began to put multiple computation units (so-called “cores”) on a single chip, in what is known as “multi-core processors”. Further, multiple chips are switched together on a single computer. On such a computer, all of the cores on these processors can access the main memory (known as “shared memory” or “shared memory architecture”).
As a result of these new hardware developments, software vendors can no longer rely on frequency-based performance improvements. Instead, they have to parallelize their software to scale with the number of available processor cores on a computer. Parallelization is difficult, however, especially for operations that were originally designed for single core or single chip computing systems.
One such operation is called “relational distinct.” The relational distinct operation eliminates duplicates from a table. Duplicates are defined based on the values of a set of columns. Two rows with the same values on these columns are duplicates. For example, as illustrated in FIG. 1, the distinct operation performed on Table 1 on column “Product” would return a table containing the three rows: Car, Boat, and Bike. The term “distinct columns” refers to the columns on which the distinct operation is defined.