A circular buffer may be described as any buffer which its one or more users consumes starting from a beginning location, continuing in a fixed order, and going back to the beginning location after reaching the end. For example, a circular buffer may be created with a linked list of data segments (memory segments or segments) or linear buffers that are used to form a ring.
Sometimes different users share a circular buffer. Users may be any type of consumer for the circular buffer, such as threads and processes. When accessing a shared circular buffer (as may be the case for tracing program code), a lock is obtained to prevent users from stepping over each other and rendering the data in the circular buffer incoherent. On the performance path, obtaining a lock may be problematic in an environment in which the number of threads accessing such a lock may be large (e.g., in the hundreds). Even if a small percentage of those threads compete for that lock, there may be a lot of time spent waiting and not accomplishing critical tasks. A thread may be described as the smallest sequence of programmed instructions that may be managed independently by a scheduler, which is typically a part of the operating system.
In some conventional systems, a task performed while the lock is held is optimized to just what is critical (e.g., just claiming an amount of space currently needed and not using that space until after the lock has been relinquished). This may improve performance, but, when a lot of threads are competing for that lock, there is a great likelihood of threads starving and critical tasks being stalled.
In some conventional systems, the circular buffer is split into as many circular sub-buffers as there are threads. Then, each thread uses its own sub-buffer. In this case, no lock is required, but significant skews in the consumption amongst the threads may lead to inefficient utilization of the total circular buffer. For example, some sub-buffers may wrap often, while other sub-buffers remain virtually empty.
In some conventional systems, a midway solution is to put the threads into small groups and divide the total circular buffer into as many sub-buffers as there are groups. The threads in each group use the associated sub-buffer, and each group of threads handles its own private lock, which reduces lock contention. If the threads are properly aggregated to groups, their cumulative production may offer smaller skews between groups, thus a better distribution in buffer utilization. While this technique may decrease the inefficiency in buffer utilization by averaging the group's individual threads' productions, having a “one size fits all” sub-buffer for each group of diversified threads with individually unpredictable production rate may still result in temporally localized inefficiencies. There will still be ebb and flow in utilization at the group level, thus creating skews.