1. Field of the Invention
The present invention relates to memory systems for multi-threading. More particularly, the present invention relates to a system and method for avoiding memory hazards in a retrofitted multi-threaded CPU.
2. The Background Art
Modern microprocessors spend a significant fraction of their time waiting on cache misses. During this wait time functional units and other pipeline resources in a CPU are mostly idle. By way of example, when the Sun UltraSPARC-II performance was analyzed using hardware counters and related tools, it was found that more than 50% of execution time was spent idle waiting for L-2 cache or memory.
Multi-threading is one well-known technique which utilizes the CPU wait time to execute another program or parts of the same program. With respect to CPUs, multi-threading provides CPUs with the ability to share CPU resources to execute two or more threads without intervention from the operating system. For CPUs a thread is the execution of instructions from a particular program.
The question of when to switch from one thread to another has been previously addressed. In the MIT Alewife project the SPARC processor had the ability to thread switch on a cache miss. More recently, a threaded Power PC has been designed by IBM/Northstar and Pulsar. Other machines such as the Tera CPU developed by Tera Systems also implement a flavor of thread switching on memory access. Each of these processors were designed from their inception as a multi-threading processor.
However, none of these prior art multi-threading processors were initially a CPU of single thread design which was later retrofitted to perform multi-threading.
The prior art also teaches the use of memory models to define the semantics of memory operation. The purpose of memory models is to specify what constraints are placed on the order of memory operations. The memory models apply both to uniprocessors and to shared-memory multiprocessors. Formal memory models are necessary in order to precisely define the interactions between multiple processors and input/output devices in a shared-memory configuration.
By way of example the SPARC-V9 architecture provides a model that specifies the behavior observable by software on SPARC-V9 systems. The SPARC-V9 architecture defines three different memory models: Total Store Order (TSO), Partial Store Order (PSO), and Relaxed Memory Order (RMO). The most restrictive memory model is the Total Store Order memory model. All SPARC-V9 processors must satisfy the Total Store Order model or a more strongly ordered model, e.g. Sequential Consistency, to ensure SPARC-V8 program compatibility.
The memory models specify the possible order relationship between memory reference instructions issued by a processor and the order and visibility of these instructions as sent by other processors. The memory model is intimately intertwined with the program execution model for instruction.
Typically a CPU issues instructions which are collected, reordered, and then dispatched to an execution unit. Instruction reordering allows operations to be performed in parallel. The reorderding of instruction is constrained to ensure that the results of program execution are the same as they would be if performed in program order. Typically, the CPU is allowed to reorder instructions, provided the reordering does not violate any of the data flow constraints for registers or for memory.
The typical data flow constraints to avoid or prevent memory hazards include:                1. An instruction cannot be performed until all earlier instructions that set a register it uses have been performed (read-after-write hazard; write-after-read hazard).        2. An instruction cannot be performed until all earlier instructions that use a register it sets have been performed (write-after-read hazard).        3. A memory-reference instruction that sets (stores to) a location cannot be performed until all previous instructions that use (load from) the location have been performed (write-after-read hazard).        4. A memory reference instruction that uses (loads) the value at a location cannot be performed until all earlier memory-reference instruction that set (store to) the location have been performed (read-after-write hazard).        
Therefore to achieve functional correctness for a single threaded processor which is retrofitted to perform multi-threading, it would be beneficial to provide a system and method which avoids memory model hazards for a retrofitted multi-threaded processor.