The amount of data collected and stored is growing quickly, and it is therefore becoming more difficult to store and retrieve data with relatively low latency. Typical server systems use one or more centralized servers to manage all of the data. While typical systems have their advantages, namely for managing small sets of data, these systems have a number of disadvantages as well. One of the disadvantages of a centralized server is that it may not be able to scale to the increased Internet scale demands for millions and billions of devices. Distributed server systems seek to solve this problem by spreading the data among numerous databases so that no single database contains a very large amount of data and/or has a very large processing load. By distributing data in this fashion, each server maintains a much smaller subset of the data and thus lookups and other operations can be performed much faster. The typical distribution model creates a web application tier and a separate database tier. These distributed systems allow for better database scalability relative to centralized server systems. Compounding the size challenge is that the data may arrive non-stop twenty-four hours per day every day of the year. Further compounding the challenge is the need to complete ever increasingly complex calculations in shorter periods of time to enable modern real-time analytics. Although distributed server systems allow for much better scalability of database systems, there are some drawbacks that have yet to be addressed.
For instance, when a database is taken offline, either for maintenance or other reasons, the data stored at that database becomes unavailable. Current distributed server systems use algorithms, such as Apache Hadoop™, which allow for the data that was stored on that database to always be available. However, for many of these systems to work properly multiple copies of all the data—three copies for Apache Hadoop™—may need to be maintained. With large data sets, these extra copies take up a lot of storage space and reducing the amount of copies, even by one, can save a lot of storage space. Storage space savings is not the only reason for the Apache Hadoop™ architecture, as the number of input-output operations is drastically limited with current computing architectures that the scale-out architecture of Apache Hadoop™ addresses.