1. Technical Field
The present teaching relates to methods, systems, and programming for data processing. Particularly, the present teaching is directed to methods, systems, and programming for data recovery in a data system.
2. Discussion of Technical Background
The advancement in the Internet has made it possible to make a tremendous amount of information accessible to users located anywhere in the world. This introduces new challenges in data processing for “big data,” where a data set can be so large or complex that traditional data processing applications are inadequate. Distributed in-memory systems can offer high throughput and low latency, but may often suffer from complete data loss in case of node failures.
A conventional approach is to synchronously or asynchronously replicate data from a given node to multiple (k) other nodes through a two phase commit (2PC) protocol, for data recovery in case of data failure at the given node. This 2PC protocol needs server rounds of network communication that slows down the whole system. In addition, as each record is replicated for multiple (k) copies, system resources like storage space are severely underutilized at 1/k. This low utilization rate is especially problematic for in-memory systems when memory chips are still expensive.
Therefore, there is a need to develop techniques to recover data in a data system to overcome the above drawbacks.