FIG. 1 shows a generic processing core 100 that is believed to describe many different types of processing core architectures such as Complex Instruction Set (CISC), Reduced Instruction Set (RISC) and Very Long Instruction Word (VLIW). The generic processing core 100 of FIG. 1 includes: 1) a fetch unit 103 that fetches instructions (e.g., from cache and/or memory); 2) a decode unit 104 that decodes instructions; 3) a schedule unit 105 that determines the timing and/or order of instruction issuance to the execution units 106 (notably the scheduler is optional); 4) an execution stage 106 having execution units that execute the instructions (typical instruction execution units include branch execution units, integer arithmetic execution units (e.g., ALUs) floating point arithmetic execution units (e.g., FPUs) and memory access execution units); and 5) a retirement unit 107 that signifies successful completion of an instruction. Notably, the processing core 100 may or may not employ microcode 108. In the case of micro-coded processors, the micro-ops are typically stored in a non volatile machine readable medium (such as a Read Only Memory (ROM)) within the semiconductor chip that the processor is constructed on and cause the execution units within the processor to perform the desired function called out by the instruction.