The present invention relates in general to parallel data processing and in particular to sharing access to a memory resource among parallel synchronizable threads.
In data processing, it is often desirable to count the number of occurrences of each of various events or conditions. For example, the well-known radix sort algorithm depends on knowing the number of items of each type in a sample set of items to be sorted. Accordingly, most implementations of radix sort include a counting stage that counts the number of items of each possible type, followed by a sorting stage that uses the item counts to arrange the items in the desired order. Radix sort is widely used in applications ranging from physics modeling (e.g., for video games) to statistical analysis (e.g., in bioinformatics).
Conventionally, to count occurrences of each of a number (m) of events or conditions over a sample set of items, a set of m counters is defined. Each item in the sample set is analyzed in turn to determine which of the m events or conditions has occurred, and the appropriate counter (or counters if the m events or conditions are not mutually exclusive) is updated.
In a parallel processor, processing time can be reduced by processing multiple items from the sample set in parallel, e.g., using multiple processing threads, each of which processes a different subset of the sample set. If each thread has its own set of m counters, the threads can proceed independently of each other. When all items have been processed, a final tally for the entire sample set can be determined by summing over the corresponding counters from each per-thread set. This approach works well as long as sufficient memory space is available to maintain m counters per thread. If the number of threads and/or the number m of events or conditions of interest becomes large enough, this approach is no longer practical.
The total amount of memory required can be reduced by sharing counters among two or more threads. As is known in the art, where a counter is shared, care must be taken to ensure that only one thread at a time attempts to update the counter. One solution involves performing the counter update as an atomic operation: that is, one thread reads, modifies and writes back the counter value before any other thread can intervene. Atomic operations can be relatively slow, however. Further, with conventional atomic operations, the order in which different threads access the counter is not consistent, and lack of a consistent ordering among the threads can make it difficult or impossible to exploit parallel processing in certain algorithms, such as radix sort.
It would therefore be desirable to provide improved techniques for controlling access to a memory resource that is shared among multiple concurrent threads.