Open Computing Language (OpenCL) is a framework supported by the Khronos Group (accessible at khronos.org) for writing programs that execute across heterogeneous platforms including graphics processing units (GPUs) and other processors. In OpenCL and other related heterogeneous computing frameworks, Shared Local Memory (SLM) is a portion of Level-3 cache which is dedicated to Execution Units (EUs) as local memory. SLM is used and shared by different work items within one work group.
However, in some cases, there are many operations related to memory writing and reading between SLM and a System Global Memory (SGM). For example, operations such as generating histograms have large numbers of work groups. In such cases, writing data from SLM to SGM takes long.