Recent trends in the computer industry, including dramatic slowing of silicon technology scaling, exhaustion of conventional micro-architecture techniques, and growing prevalence of functions (e.g., integrated processor (IP) cores) requiring substantial computation of a particular type, are combining to make certain design integration approaches, such as multi-core implementations, increasingly attractive for application to system-on-chip (SOC), or system-on-a-chip products.
Conventional inter-processor communication in a shared-memory multiprocessor is generally carried out using a cache coherence protocol that enables the correct sharing of data among multiple processors. Conventional techniques allow functional units, such as IP cores, to communicate via local memory and/or a register file system interconnected over a standard bus. However, these conventional techniques lack communication mechanisms and protocol to handle communication needs for multi-core implementations resulting in large reductions in average communication latency and at the cost of protocols and systems that are too complex to be feasible. As a result, multi-core implementations using conventional techniques exhibit a large overhead for required communication traffic.
Other techniques make use of speculative execution to hide the long latencies, i.e., control speculation, dependence speculation, speculative parallelization, speculative lock elision, and coherence decoupling.
A semaphore may be used to manage access to a shared resource in a given system. Conventionally, semaphore locations are static and fixed in hardware under software control. Semaphore locations are often accessed by master and/or slave devices with on-chip network ports that implement point-to-point or multi-point protocols between processors and interrupt enable memory-map devices connected via a hierarchy of dedicated universal interrupt controllers (UIC). Such mechanisms tend to be somewhat non-deterministic with respect to response time, and are often not flexible or scalable for evolving SOC functionality.
Further, conventional static semaphore multi-core systems can suffer from multi-threaded task overhead due to communication latency, context switching, and cold cache effects. Furthermore, conventional static semaphore locations suffer from the fact that once the design is implemented in silicon, new software may no longer run in an optimum manner.
Semaphores are conventionally known in the art for controlling access to shared resources in systems wherein a plurality of execution units, such as processing units and hardware engines, each require access to a shared resource. The shared resource is typically a memory space for storing information which may include a single bit of data, a byte, or a large data structure. The shared resources could also be the processing resources of a processing unit. However, in multiprocessor systems, existing semaphore techniques are implemented at the main memory level. However, when semaphores are so implemented, caches used in the main memory may start to break down, which adversely affects processing efficiency.
Thus, there is a need in the art for efficient, high-bandwidth, and low-latency communication mechanisms between functional units on the same chip, without complicating the underlying coherence protocol that guarantees correctness