The rise of the computer age has resulted in increased access to personalized services online. As the cost of electronics and networks drop, many services that were previously provided in person are now provided remotely over the Internet. For example, banking services can now be provided entirely through a networked system, as users can instruct the system to complete deposits withdrawals, and other transactions and receive notification that these transactions have been completed.
As more and more services are provided online, large amounts of data are generated consistently. Much of this data needs to be saved for later use. For example, banking transactions, messages, search histories, browsing histories, statistical analysis of data, all, generally, needs to be saved to be useful in the future. With so much data needing to be saved, storage systems need to be able to accommodate a large amount of data reliably. However, such systems are generally unable to guarantee that all of the storage components will operate completely error free and failure free. As such, large storage systems often operate over a network to store multiple copies of important data at multiple locations. This improves the reliability and usefulness of a storage system. This data must be transferred to backup locations without data loss or corruption.
Large data stores also facilitate data recovery in case of a crash by storing a transaction log for a given database. Thus, each time data in the database is changed, the change is recorded in a transaction log. This allows all the changes to be stored in a relatively compact form. Then, if the database crashes, the transaction log can be used to rebuild a correct and up to date version of the database. This can accomplished by reapplying all the changes in the correct order to the original data set.
The other major consideration is the speed with which data can be reliably stored. To improve the speed at which storage systems work, it is important to identify bottlenecks in the process. Once a bottleneck (e.g., a particular part of a system that delays the entire process) is identified, it can be removed or ameliorated. In traditional networked storage systems, one such bottleneck is the time needed to log and store data at the system that produced the data before beginning to distribute it.
Like reference numerals refer to the same or similar parts throughout the drawings.