Computing systems rely heavily on the data stored within their storage systems. For a variety of reasons, there is a need to ensure that the data is organized so that reading and writing operations can be more efficient. Some systems deal with this issue by packing data into larger units that are referred to as containers throughout this disclosure. Containers are examples of data structures used to organize and store data, where the data may or may not be related or sequential to one-another. After a container is filled, it is written as a unit to storage. A container has several advantages. For example, a container can reduce internal fragmentation and is more efficient when performing large write operations.
Ideally, segments or data from the same stream of data are packed together in the same container. This can be accomplished by associating each data stream with its own container. A data stream may be assigned to a file being written, assigned to a client system writing to the storage system, or assigned with other techniques. As the number of data streams in the computing system increases, this arrangement can become unmanageable and requires a prohibitive amount of memory because each stream would have a container assigned to it. For 1,000 streams, each with a 1 MB container, 1 GB of RAM would be allocated. In cases where some data streams write slowly into their assigned containers, unused space in containers is assigned, but effectively wasted.
Instead of allocating a container for each data stream, containers can be shared. Thus, segments from multiple streams are packed together in the same container. Sharing a container among multiple data streams avoids a situation of having an in-memory (e.g., in RAM) container per stream and avoids a problem of wasted space associated with streams that write slowly or infrequently to their own containers.
However, sharing a container between streams leads to additional concerns. For instance, some storage applications, such as data caching on solid-state drives (SSDs), can have a very high data stream count and a very high write churn. If each data stream is a thread of execution, there can be high contention to simultaneously pack data into the same container. In other words, several data streams may want to write into the same container at the same time. This can be handled by either queuing the requests so that a single packer thread writes to the container or by acquiring a container lock such that only one data stream has access to the container at a time.
The problem with queuing requests is that it involves extra copying of the data between memory locations and has a corresponding performance cost. The problem with acquiring a container write-lock is that it leads to execution threads waiting because of lock contention. Systems and methods are needed to improve write throughput for packing data in a storage system.