The Hadoop technology is a distributed computing platform that comes out in recent years for processing massive data, and has a great advantage in massive data processing and in reliability. Hadoop is a software framework that is capable of performing distributed processing on a large amount of data in a reliable, highly efficient, and scalable manner. Hadoop is reliable because it maintains multiple copies of operational data based on an assumption that element computation and storage may fail, to ensure that distributed processing on a failed node can be performed again. Hadoop is highly efficient because it operates in a parallel manner, thereby accelerating processing by using parallel processing. Hadoop is also scalable and is capable of processing data in petabytes (PB).
Key performance indicator (KPI)/key quality indicator (KQI) computing often requires an association to be created between multiple events, that is, multiple events are associated based on a specific dimension, and a KPI/KQI is generated only when a condition is satisfied. However, Hadoop is not properly supported in solving such a problem, where a complex operation of Map Reduce (MR) is performed for multiple times to generate more intermediate files, so as to achieve a partial result. Moreover, MR is weak in timeliness. All intermediate results need to be written to a Hadoop file system (HDFS) by performing input/output (IO), and then read for participating in a next computation. IO becomes a bottleneck for processing massive data and real-time performance is degraded.