In recent years, computing systems have seen a major change as growing volumes of data and stalling processor speeds required more and more applications to scale out to distributed systems. Today, various data sources from Internet to business operations produce large volumes of data. The management and distribution of large volumes of data, from storing to long-term archiving has become a tedious task.
The growing number of organizations led to a huge production of data, which in turn resulted in the need of speed and sophisticated data processing systems. Batch processing and streaming analysis of new real-time data sources is required to let organizations to take timely action.
In traditional distributed systems, there are multiple threads pulling data from multiple remote data centers for a given time period. For example, the data provider has a predefined Service Level Agreement (SLA) to deliver data to a consumer for every 1 minute. In such a case, for the first minute the thread T1 is pulls the data from a remote data center DC1. The thread T1 waits for the thread T2 and other such threads pulling data before proceeding to the second minute. If this flow doesn't work properly than it violates the predefined SLA, as thread T1 will move to the second minute. However, T2 is still working on the first minute. A consumer on looking at the data in the second minute might be deceived into believing that the data collected in the first minute is immutable and might not get complete snapshot of the data there. Hence, it does not provide the consumer a mechanism to have a clean abstraction and a clean implementation.
In light of the above discussion, there is a need for a method and system, which overcomes all the above stated problems.