This relates generally to processor graphics also known as graphics processing units.
Graphics processing units are responsible for generating images on displays. One computing language for use with processor graphics is called the Open Computing Language (OpenCL) that is designed to work with many different platform models. Thus it is a computing language that is adaptable to many different architectures from different manufacturers.
OpenCL uses a platform module made up of processing elements, global memory and localized programmable memories under the name of local memory. A processing element is any virtual scalar processor. A work-item, executable on one or more processing elements, is one of a collection of parallel executions of a kernel invoked on a device by a command. A work-item is executed by one or more processing elements as part of a work-group executing on a compute unit. A work-group is a collection of related work-items that execute on a single compute unit. The work-items in the group execute the same kernel and share local memory in work-group barriers.
A compute device is an OpenCL device having one or more compute units. A work-group executes on a single compute unit. A compute unit is composed of one or more processing elements and local memory.
There are two types of barriers, a command-queue barrier and a work-group barrier. The OpenCL C programming language provides a built-in work-group barrier function. It can be used by a kernel executing on a device to perform synchronization between work-items and a work-group executing the kernel. All of the work items of the work-group execute the barrier construct before they are allowed to continue execution beyond the barrier.
A device is a collection of compute units. A command-queue is used to queue commands to a device. Examples of commands include executing kernels, or reading and writing memory objects.
A global memory is a memory region accessible to all work items executing in a context. It is accessible to the host using commands such as read, write and map. Local memory is a memory region associated with a work-group and accessible only by work-items in that work-group.
A context is the environment in which the kernels execute and the domain in which synchronization and memory management is defined. A context includes a set of devices, the memory accessible to those devices, the corresponding memory properties and one or more command-queues used to schedule execution of kernels or operations of memory objects.
A host interacts with the context using the OpenCL application program interface (API). A kernel is a function defined in a program and executed on an OpenCL device.
Single instruction multiple data (SIMD) is a program model where kernels execute concurrently on multiple processing units, each with its own data and a shared program counter. All processing elements execute a structurally identical set of instructions or group.
Local memory plays an important role in OpenCL kernels. However, in processor graphics with slower speed lower level caches (L3 caches), performance may be less than in those with higher speed local memories. Thus, contrary to the aim of the OpenCL API, performance may vary between different processor graphics based on the relative speeds of their local memories.