The present application relates generally to an improved data processing apparatus and method and more specifically to version pressure feedback mechanisms for speculative versioning caches.
Speculative versioning caches are cache memory structures that are capable of storing multiple versions of a cache line to enable speculative execution of threads in a multithreading data processing environment. Speculative execution of threads is an optimization technique by which early execution of a thread, whose results may or may not be later needed, is performed so as to achieve greater performance should that thread's results be needed during the execution of the code, i.e. should the thread be transitioned from a speculative state to a non-speculative state in which the results are used. A speculative versioning cache is an extension of a typical cache, where the speculative versioning cache is capable of holding data which is accessible only to the hardware thread that wrote it. All modified annotated cache lines can be discarded atomically using a special command (Abort), or made architecturally visible to other threads using another command (Commit).
Depending on the mode, it is possible that data, written by a hardware thread while executing a speculative task, can also be accessed by other threads that are executing tasks that correspond to logical successor tasks. Thus, speculative versioning requires the tracking of the program order among multiple buffered versions of a memory location to guarantee certain sequential program semantics. First, a load must eventually read the value created by the most recent store to the same memory location. This requires that the load must be squashed and re-executed if it executes before the store and incorrectly reads the previous version. Moreover, this requires that all stores to the same memory location that follow the load in program order must be buffered until the load is executed. Second, a memory location must eventually have the correct version of data independent of the order of the creation of the versions. Consequently, the speculative versions of a location must be committed to the architected storage in program order.
Speculative versioning caches support speculative multithreading by providing the ability to store speculative versions of cache lines in association with the speculative threads. One example of a speculative versioning cache is described in Gopal et al., “Speculative Versioning Cache,” Proceedings of the 4th International Symposium on High-Performance Computer Architecture, Jan. 31 to Feb. 4, 1998, page 195. In this example of a speculative versioning cache, a private cache is provided for each processor with the system being organized similar to a snooping bus-based cache coherent symmetric multiprocessors (SMP). Memory references that hit in the private cache do not use the bus as in an SMP. Task commits do not write back speculative versions en masse. Each cache line is individually handled when it is accessed the next time.
With the speculative versioning cache described in Gopal et al., programs are partitioned into fragments called tasks which form a sequence corresponding to their order in the dynamic instruction stream. A higher level control unit predicts the next task in the sequence and assigns it to a free processor for execution. Each processor executes the instructions in the task assigned to it and buffers the speculative state created by the task in its private cache. When a task mis-prediction is detected, the speculative state of all the tasks in the sequence including and after the incorrectly predicted task are invalidated and the corresponding processors are freed. This is referred to as a task squash. The correct tasks in the sequence are then assigned for execution. When a task prediction has been validated, it commits by copying the speculative buffered state to the architected storage, e.g., data cache. Tasks commit one by one in the program order. Once a task commits, its processor is free to execute a new task. Since the tasks commit in program order, tasks are assigned to the processors in program order.
A task executes a load as soon as its address is available, speculating that stores from previous tasks in the sequence do not write to the same location. The closest previous version of the location is supplied to the load. A load that is supplied a version from a previous task is recorded to indicate a use before a potential definition. If a definition, e.g., a store to the same location from a previous task, occurs, the load was supplied with an incorrect version and memory dependence was violated.
When a task executes a store to a memory location, it is communicated to all later active tasks in the sequence. When a task receives a new version of a location from a previous task, it squashes if a use before definition is recorded for that location, i.e. a memory dependence violation is detected. All tasks after the squashed task are also squashed as one a task mis-prediction.
The oldest active task is non-speculative and can commit its speculative memory state, i.e. versions created by stores from this task, to architected storage. Committing a version involves logically copying the versions from the speculative buffers to the architected storage, e.g., data cache. When a task is squashed, the speculative state associated with a task is invalidated and not committed to architected storage.
The private caches of the various processors together constitute the speculative versioning cache. Each cache line of the private caches stores an address tag (Tag) that identifies the data that is cached, a valid bit (V) that identifies whether the cache line is valid or not, a dirty bit (S) that identifies whether a store to the cache line has occurred or not, the data itself, a load bit (L) that identifies whether a task loads from the cache line before storing to the cache line occurs, a pointer (Pointer) that identifies the processor (or L1 cache) that has the next copy/version, if any, in a version ordering list (VOL) for the cache line, and the data itself (Data).
The speculative versioning cache uses combinational logic, referred to as the version control logic (VCL), that provides support for speculative versioning using the VOL. A processor request that hits in the private cache of the processor does not need to consult the VOL. Cache misses issue a bus request that is snooped by the private caches. The states of the requested cache line in each private cache and the VOL are supplied to the VCL. The VCL uses the bus request, the program order among the tasks, and the VOL to compute appropriate responses for each cache. Each cache line is updated based on its initial state, the bus request, and the VCL response.
With speculative execution of threads in a multithreading data processing environment, threads are permitted to execute until there is a dependency violation between two or more threads, e.g., a first thread executes a read of an memory location following by a second thread, that is younger than the first thread, executing a write of the same memory location, or a conflict in which two or more threads attempt to modify the state of the same portion of data in the cache or memory, is encountered. Typically, at this point, one of the threads is permitted to persist its state while the other thread(s) must be squashed, i.e. all work performed by the thread that has not been persisted is rolled-back or aborted. Such squashing of threads is significantly more expensive than a typical cache miss as it results in cancelling all of the work performed by a given speculative thread and possibly all of the work performed by any successor speculative threads as well.