According to a further embodiment of the present invention, a system comprises: a coordinator unit; a plurality of worker units; a set of tables at a first join depth, each table having columns and rows, wherein the coordinator unit reads data in the tables row-by-row and distributes each row to separate worker units; the worker units operating in parallel to compute a partial frequency histogram for each column in the table received from the coordinator unit; and the coordinator unit merging partial histograms from the worker units and sending the merged frequency histograms to the worker units.
Typically, the denormalized data is compressed to reduce storage requirements. One such approach is to use frequency partitioning, combined with dictionary-based encoding, which is described in co-pending U.S. patent application Ser. No. 12/198,079, entitled “Frequency Partitioning: Entropy Compression with Fixed Size Fields”, now U.S. Pat. No. 7,827,187, which is incorporated herein by reference. The most frequent values that occurred in a particular column are encoded with short codes only, while less frequent values get a longer code assigned. The codes length determines the partition to which the values belong.
Embodiments of the invention teach a way to exploit multiple computer systems to compute jointly the frequency histograms for the joined, denormalized table—without actually performing the join for all rows.