Internet, mobile communications, navigation, online gaming, sensing technologies and large scale computing infrastructures are producing large amounts of data every day. Big Data is data that is beyond the processing capacity of conventional database systems and analyzing capacity of traditional analyzing methods due to its large volume and fast moving and growing speed. More companies now rely on Big Data to make real-time decisions to solve various problems. Current methods involve utilizing a lot of computational resources, which are very costly, yet still may not satisfy the needs of real-time decision making based on the newest information, especially in the financial industry. How to efficiently, promptly and cost-effectively process and analyze Big Data presents a difficult challenge to data analysts and computer scientists.
Processing Big Data may include performing calculations on multiple data elements. When performing some statistical calculations on Big Data, the number of data elements to be accessed may be quite large. For example, when calculating an autocorrelation, a (potentially large) number of data elements may need to be accessed.
Further, some statistical calculations are recalculated as old data elements are removed from a Big Data set. Thus, the (potentially large) number of data elements may be repeatedly accessed. For example, it may be that an autocorrelation is calculated for a computation window whose size n keeps decreasing to exclude the accessed or received data element from a Big Data set. As such, every time a data element to be removed is accessed or received, the element is removed from the computation window. After a data element is removed from a computation window with a size n, n−1 data elements in the adjusted computation window are then accessed to recalculate the autocorrelation.
Depending on necessity, the computation window size n could be extremely large, so the data elements in a computation window may be distributed over a cloud comprising hundreds of thousands of computing devices. Re-performing an autocorrelation calculation on Big Data sets after some data changes in traditional ways results in slow response and significant waste of computing resources.