There has been an emergence of established techniques for analyzing data with high-speed parallel processing. For example, one technique allows to distribute data over machines and perform parallel processing at machines holding the distributed data (refer to NPL 1).
On the other hand, to move (transfer) data from an accumulation device which accumulated data to a data processing device, the data needs to be changed in format or structure compatible with the data processing device.
This change requires the processes of extracting data from the accumulation device (Extract), transforming the data format (Transform), and loading the data into the data processing device (Load). These processes are called ETL processing as their acronym.
In ETL processing, the transforming process often forms a bottleneck. This is attributable to the recent wider bandwidths in storage layer and wider processing bandwidths provided by multi-core computing devices.
One existing solution to this problem is a technique to compress data and transfer it (for example, refer to PTL 1 and PTL 2). PTL 3 also discloses a technique to compress and transfer data in parallel.
PTL 4 and PTL 5 each disclose a technique to sort data into the data to be compressed and the data not to be compressed before data transfer.