Distributed computing systems generate volumes of data. Some systems store this data as a data log file (“log”). The distributed computing system can have multiple such logs. To provide data reliability and durability, they can store multiple replicas of the logs. Also, some log storage services, in order to manage the load on a log, store the log in a distributed format. That is, they can store different portions of the log at different storage systems. For example, the log storage service can split the storage of the log to three storage systems. The log storage service can have some mechanism of determining which records of the log are stored at which storage system. While such a mechanism can minimize the load on the log compared to if the log were stored on a single storage system, it is still inefficient. Such log storage services do not provide good write availability and/or are not able to tolerate spikes in writes.
Consider an example where the log is written to a small number of storage systems, e.g., three storage systems. If any of the three storage systems fail, the log storage service may not be able to write to the log, and therefore can lead to data loss. Some log storage services deploy record placement applications that determine the storage system at which a record of the log is to be stored based on a mathematical function. The disadvantage of such log storage services is that the record placement applications can be a single point of failure. If the record placement application crashes, the log storage service can fail. Even otherwise, the mathematical function on which the record placement application is based provide only a few choices since the number of storage systems across which the log is stored is a small number, which does not help much in improving the write availability.
Also, in the current log storage services, since the number of storage systems across which the log is stored is small, they may not be able to tolerate spikes in writes. If a number of applications start writing into the log, the storage systems may be overloaded and cause a significant delay in writing the records to the log.