1. Field
Methods and apparatuses consistent with exemplary embodiments relate to predicting a Read-After-Write (RAW) hazard, and more particularly, to a method and apparatus for dynamically sampling a RAW predictor to optimize prediction of the RAW hazard.
2. Description of Related Art
Processors commonly rely on performing load and store instructions out of order to achieve higher performance. When a younger load instruction to an address in memory is executed before an older store instruction to that same address in memory, and the store data was not correctly forwarded to the load, then this is known as a Read-After-Write (RAW) hazard in which the load instruction employs bad load data. When the RAW hazard occurs, processors typically need to repair the bad load data by performing a costly RAW resynchronization exception (RRE), in which all in-flight instructions younger than the store instruction are flushed and fetch restarts are performed for the instructions following the store instruction. This event is costly due to the extra core clocks required to flush ops and re-fetch the instructions. This is known as the RRE penalty.
To avoid a costly RRE, processors can employ a RAW Resynchronization Predictor (RRP) that can train on previous RREs, based on the address of the store instruction, and avoid these RRE in the future. When the RRP is trained and detects a store instruction address that has been shown to cause an RRE, the RRP can send an indication to block any younger load instructions from executing ahead of the resynchronization predicted store (RPS) instruction. Without a younger load instruction executing ahead of the RPS, no RRE can occur, which avoids the RRE penalty. The younger load instructions are only unblocked once execution of the RPS is completed.
Frequently, there are multiple younger load instructions that can be blocked by an RPS and only one of those load instructions is to the same memory address of the RPS. This is the critical load instruction that needs to be blocked to avoid the RRE penalty, but all other Non-Critical Loads (NCL) are also blocked by the RPS. When an RPS blocks NCLs, there is lost performance due to those NCLs having to wait to execute. This is known as the NCL penalty. Typically the performance gain of avoiding the RRE penalty is greater than the performance loss of the blocked NCLs, making the RRP worthwhile.
However, when instruction code is repeated in a loop, the execution behavior may differ slightly for each iteration. If no RRP was employed, it is possible that on the first iteration of the instruction code, an RRE may occur, but on all subsequent iterations of the instruction code an RRE would not have occurred. Therefore the RRE penalty with no RRP would have been very small because RRE would have occurred only once. Nonetheless, if an RRP was used in this scenario, then the RRP would train on the first RRE and avoid all subsequent RREs to that same store even if no further RREs would have actually occurred. Therefore, while there is actually no significant RRE penalty for the RRP to avoid, the RRP is still causing an NCL penalty. In this case, the RRP negatively affects performance.