Processors or computing systems generally make use of pipelined architectures. In computing, a pipeline is often a set of data processing elements (e.g., execution units, functional unit blocks (FUBs), combinatorial logic blocks (CLBs), etc.) connected in series, where the output of one element or pipeline stage is the input of the next pipeline stage. The stages of a pipeline are often executed in parallel or in time-sliced fashion. This generally allows a computer to execute several instructions substantially in parallel or in near-parallel, as a second instruction may be started in the first pipeline stage as soon as the first instruction has exited that first pipeline stage, despite the first instruction not being fully completed or done all the pipeline stages. This pseudo-parallelism greatly increases the speed at which a group of instructions may complete, despite the instructions' dependence upon each other (e.g., the second instruction may rely upon the result of the first instruction, etc.).
Further, many instructions involve the reading (load) or writing (store) of data from a memory. Often the data is stored in a cache system. A computer's cache system is generally a tiered system of increasingly smaller but faster memory components that each store a sub-set of the data stored in the larger but slower next tier. If the desired piece of data is found in the smallest, fastest cache the instruction completes without incident. However, if the data is not in the sub-set stored in the smallest, fastest cache, the data must be retrieved from the next tier in the system (and so on) and this often causes delays and other complications.
Processors commonly rely on performing load (read) and store (write) instructions out of order to achieve higher performance. If the load and stores are to different memory addresses (i.e. for different pieces of data), this may occur without problems and may speed the overall execution of the program executed by the processor.
However, occasionally when a younger load (read) instruction is executed before an older store (write) instruction to the same memory address, the load (read) may return incorrect or out-of-date data. This is generally known as a pipeline hazard or more specifically a Read-After-Write (RAW) hazard. Generally, when this occurs, processors need to repair the bad load data by performing a costly RAW resynchronization exception (RRE). Often in order to repair this, all in-process instructions younger than the store (write) are flushed from the processor's pipeline (i.e. all the work done on any instructions after the store, including the load, is discarded). All of the instructions after the store are then restarted, as the work previously performed on them was incorrect or suspect. This event is frequently costly due to the extra clock cycles it takes to flush or discard instructions, re-fetch or re-start them, and then re-perform them. This is often referred to as the RRE penalty.