Traditionally, computer programs have been written as sequential programs where the code of the computer programs executes sequentially on a single conventional processor. However, because the performance of specialized processors, such as graphics processing units (GPUs), that includes multiple processing cores continue to increase at a rapid rate, computer programs are increasingly being written to take advantage of such specialized processors. For example, computer programs are being written to include data parallel code, so that the same code may execute across multiple processing cores of a processor to operate on a set of data in parallel. Because such data parallel code is executed in parallel instead of sequentially, there may be no guarantee as to the order in which the code will finish processing the set of data. Therefore, it may be desirable to synchronize the parallel execution to ensure that the multiple processing cores have finished operating on the set of data before the values of the data are used in any further operations.