The performance of a data processing system depends in large measure on its memory design. Due to the typically large operating speed differential between the data processor and the memory, the data processor in many conventional data processing systems may spend 80% of its operating time waiting for memory accesses. Many conventional systems attempt to close the speed gap by using large multi-level cache arrangements. However, frequent cache misses increase the processor demand for accessing the slow memory, and the use of cache can increase the data transfer time involved in memory accesses. Also, the number of cache misses increases with the length of the data processing application, thus causing a proportional increase in the time that is spent accessing memory.
Some conventional data processing systems use multiple processors (sometimes contained within a single chip) in combination with shared memory, to improve performance in parallel applications. However, the shared memory requires synchronization overhead. This synchronization overhead, together with the relatively slow memory operation mentioned above, can limit performance gains and scalability in multiprocessor arrangements. The limitations of synchronization overhead and slow memory operation also hamper performance and scalability in conventional multi-threading systems (which use a single processor capable of executing parallel threads).
It is desirable in view of the foregoing to provide solutions that can alleviate the aforementioned problems associated with conventional data processing systems.