With the advancement of computer technology, larger amounts of data need to be processed to perform particular operations. A single computer may not be able to handle large-scale data processing. Some technologies use a plurality of computers to form a distributed system to store and to process data in parallel.
For example, HADOOP is a distributed data processing system that can use a large number of inexpensive computers to form a computer cluster. The HADOOP system can split files into blocks and distribute the blocks across computers in the computer cluster. As such, the less-expensive cluster of computers can be used to replace an expensive, high-speed computing and storage system or device. HADOOP can include a storage part, known as HADOOP Distributed File System (HDFS), and a processing part which is a MAPREDUCE software framework or programming model. The HDFS can be used for data management and storage. The MAPREDUCE software framework includes a mapper and a reducer. The mapper can map input key/value pairs to a set of intermediate key/value pairs. The reducer can reduce the set of intermediate values to a smaller set of values.
Using distributed data processing, a large-scale data processing task can be split into smaller tasks to be executed in parallel by a large number of computers. However, the total amount of computing resources needed and the data processing amount are not reduced, which may not satisfy efficiency requirements for time-sensitive data processing tasks.