Map/Reduce (Map/Reduce) is a programming model, used for parallel computing of massive datasets, for example, parallel computing of datasets of more than one terabyte (TB).
During dataset processing, a dataset is divided into multiple data slices, and a master (master) node schedules worker (worker) nodes to process the data slices. The master assigns a map task (map task) to an idle worker, and a worker to which the map task has been assigned becomes a mapper node. In addition, the master assigns a reduce task (reduce task) to another idle worker, and a worker to which the reduce task has been assigned becomes a reducer node. The mapper node temporarily stores a result of executing the map task into a circular memory buffer, and spills the result in the circular memory buffer into a disk by using a disk input/output (I/O). One spill file is obtained during each time of spilling. In a process of spilling to generate spill files, the mapper node separately partitions (partition) and sorts (sort) results in the circular memory buffer according to key (key) values processed by all reducer nodes. After completing execution of the map task, the mapper node reads the spill files in the disk, merges (merge) the spill files into one file, and writes the merged file into the disk again. Therefore, in processes of partitioning (partition), sorting (sort), and merging (merge), a disk I/O may be used for multiple times to perform disk read/write operations. The mapper node notifies the master when completing execution of the map task, and then the master notifies the reducer node of an identity of the mapper node. The reducer node requests data from the mapper node according to the identity, the mapper node and the reducer node establish a Transmission Control Protocol (TCP) stream, and the mapper node reads data to be processed by the reducer node from the file stored in the disk, and sends the read data to the reducer node by using the TCP stream. In a process in which the mapper node sends the data to the reducer node by using the TCP stream, the mapper node needs to use a disk I/O to read data from the disk, and use a network I/O to transmit the TCP stream carrying the data to the reducer node. However, performing disk read/write operations by using a disk I/O and transmitting data to the reducer node by using a network I/O are very time-consuming, leading to that an execution time for completing the Map/Reduce task is prolonged.