Some graphics processing unit (GPU) based applications require that input data is processed when the overall size of the input data is too large to fit completely into GPU device memory. One solution for this problem is to split computation into multiple kernel runs, and process one chunk of data at a time. This approach works well with algorithms that process data sequentially, but can become problematic in cases where random (read/write) access to data is required (e.g. recursive ray tracing, etc.). Thus, there is a need for addressing this issue and/or other issues associated with the prior art.