In computer related technologies, especially in distributed systems having a number of users performing a number of activities, large amount of data is created or modified. The data along with system level and user level actions may be stored in a storage system for monitoring, financial, and/or analysis purposes. However, if the storage system fails, the data may be lost which is not acceptable to the users of the distributed system. Accordingly, it is very important to record the details of the user actions in a log file and persist the log file in one or more storage systems. The log file may also be persisted separately from the actual data.
The prior log management systems that persist log files are not efficient at least in terms of scalability, load balancing, log persistence latency, consumption of computing resources, etc. in a high data throughput environment. In the prior log management systems, the distribution of load among writers assigned to persist the log files becomes complex and inefficient, especially as the number of computers in a computer network generating the log file increases. Further, in some systems, generated logs may be identified by a source specific category/group. This category is used to identify and fetch the corresponding logs from the storage system. The scaling complexity results from the need to process and persist these categories independently in the storage system. In some log management systems, an election per shard/category/group is performed among the writers to choose a particular writer for persisting a particular log file (category or group) into the storage system. As the number of generated log categories increases, the election process becomes more complex and resource intensive.
Further, in the prior log management systems that lack a central log management system, load distribution among various writers is not as effective since an overall load situation, that is, a load on each of the writers in relation to each other is not known. Further, since the state of distributed system keeps changing dynamically, the log management techniques need to adapt to the changed state of the distributed system quickly and efficiently.