Current processing systems have multiple processing cores to provide parallel processing of computational tasks, which increases the speed of completing such tasks. In multi-core systems, it is desirable to perform multi-threading in order to accomplish parallel processing of programs. Multi-threading is a widespread programming and execution model that allows multiple software threads to exist within the context of a single process. These software threads share the resources of the multi-core system, but are able to execute independently. Multi-threading can also be applied to a single process to enable parallel execution on a multi-core system. This advantage of a multi-threaded program allows it to operate faster on computer systems that have multiple CPUs, CPUs with multiple cores, or across a cluster of machines because the threads of the program naturally lend themselves to concurrent execution.
However, programs executed over multiple cores are limited by the processing speed of the processing cores as well as any conflicts over shared resources such as external memory in the form of RAM. Processing cores such as CPUs typically include high speed internal memory, termed cache memory, which is used to speed up access to data and instructions used by the CPU. Memory caches save computationally expensive reads to RAM. Memory caches typically function by loading data that may be next used by the CPU. Whether data stored in the cache is useful to the CPU is sometimes random, as the CPU will first look to needed data in the cache and then in RAM.
Memory latency, which is the length of time between the receipt of a read request and its release of data corresponding with the request, is a key consideration of any software program that attempts to run in an efficient manner. In a multi-threaded environment this consideration is even more critical since the more threads that are running, the greater the likelihood that this latency will effect overall performance. This is especially true in the case where two different threads are attempting to write and/or read from a shared memory location in RAM. This latency is a critical factor when considering the run time cost of transferring state data from one location in program memory to another.
CPU memory cache efficiency is another important consideration. As explained above, the cache memory is a fast, but extremely limited, local memory storage for a processor. Data from external memory is copied into the CPU cache for two main purposes. One is to provide the memory for the actual CPU instructions that are to be executed. This is commonly referred to as the instruction cache. The other cache is commonly referred to as the program data cache. The program data cache is where the program state data that instructions are paired with is stored. While various CPUs may handle the memory cache differently, there is a uniform issue in that the instruction cache and the data cache are an extremely limited resource. Due to this, it is extremely important to utilize these caches in an efficient manner. In addition to being an extremely limited resource, the performance of both the instruction cache and the data cache is gated by how fast the processor can transfer data from main memory into the cache for use by the processor. In this way, the performance of a software program is tied to both efficient instruction cache utilization and data cache utilization, as well as, the latency inherent in updating these resources. To efficiently communicate or transfer data and commands within a multi-threaded program it is desirable to minimize the required updates to the instruction cache and the data cache.
Thus, there is a need for defining an efficient framework for communicating program state and commands to various functions from a program executed by a processing unit by minimizing required updates to the instruction cache and data cache. There is a further need for a command module to bind delegates from a program to data in the form of payloads for later access for efficient processor execution using the data. There is a further need for a command module which allows necessary resources to be accessible in the cache when program delegates are loaded.