In modern computer systems, data processing applications are executed on computer systems comprising standard computer architectures. As data processing applications are continually analyzing larger and large datasets, such as big data applications, computer systems are being improved to meet the needs of the applications and large datasets. As used herein, the term big data means any collection of data that is too large to process using traditional data processing methods and computers. For example, more than a terabyte of data may be analyzed by a data processing application to determine marketing trends. Typical computer systems executing big data applications may incorporate 64 gigabytes of random access memory.
Following is a description of traditional computers and data flow in big data applications. Reference is now made to FIG. 1A, which is a schematic illustration of current computer components and interconnections. The main central processor and/or processing unit 111 (CPU) executes a data processing application using computer memory 112, using a direct memory access (DMA) interface, a front side bus, and the like. As used herein, the term processing unit and/or processor means a processor of a computer, computerized device, computerized apparatus, and the like. As used herein, the term central processor means the main electronic integrated circuit that executes the instructions of a computer program, such as the main processor, a central processing unit, multi-core central processor, and the like. The CPU 111 is connected to a platform input and/or output (I/O) controller hub 113, such as using a direct media interface. For example, the CPU is an Intel® Itanium® processor, an Intel® Xeon® processor, an Advanced Micro Devices Opteron™ processor, an IBM® zEC12 processor, and the like. The platform I/O controller hub 113 is connected to computer peripherals such as a hard disk, a network interface, a keyboard, a mouse, and the like.
Big data applications transfer large quantities of data from the hard disk and/or network interface to be stored in the computer main memory and from main memory to the CPU for pre-processing and analysis.
Reference is now made to FIG. 1B and FIG. 1C, which are schematic illustration of a standard data flow stage and a standard process graph in a data processing application, showing the data reduction steps. The data flow architecture comprises three stages: an I/O stage 403 where the big data 406 are received by from repositories, such as terabytes of data, a memory stage 402 where the big data 406 are stored in main memory, and a CPU stage 401 for pre-processing and analysis. The CPU stage 401 applies a filter and aggregate process 404 to the big data 406 to extract a smaller dataset 407, such as a dataset of megabytes in size. The smaller dataset 407 is analyzed 405 to produce a structured dataset and/or results 408 stored in the memory during the memory stage 402. The structured dataset and results 408 may be sent during the I/O stage to a user and/or a computer device for further processing. The results 408 may again be pre-processed 404A and analyzed 405A, resulting in a smaller dataset 408A, such as a dataset size reduced to kilobytes of data. The CPU of standard computer architectures performs both the pre-processing and processing using the computer main memory.