A parallel application that includes several processes may perform input/output (I/O) collectively and in parallel. Frequently, the collective I/O may read or write an entire file, although each process accesses only a subset of the file contents. In some cases, the subset of the file accessed by one process may be interleaved with the subsets accessed by other processes. Further, the data may be arranged in memory for processing differently than in the file. Because of the potential complexity of data rearrangement and the need to synchronize the operation of multiple processes, it can be difficult to complete a collective I/O operation quickly.
One solution to this problem is data sieving. Data sieving is a way of combining multiple I/O requests into one request so as to reduce the effect of high I/O latency time. Data sieving for reads involves each process independently reading large blocks from a file and extracting its own relevant data. For writes, each process participating in the data sieving must lock a range of the file (gaining exclusive access), read the previous file contents in that range, insert its own data, write the updated data to the file, and release the lock. These steps are repeated by each process until the entire file range being collectively written has been updated. The locking required limits parallelism, and hence the speed at which the file can be written, because other processes will not be able to access that portion of the file until the lock is released.
A mechanism to reduce lock contention and improve the speed of parallel writes to a file by reducing the amount of time that multiple processes utilizing a data sieving algorithm are trying to update the same range within a file would be beneficial.