FIG. 1 shows the architecture of a multi-core processor 100 within a computing system. As observed in FIG. 1, the processor includes: 1) multiple processing cores 101_1 to 101_N; 2) an interconnection network 102; 3) a last level caching system 103; 4) a memory controller 104 and an I/O hub 105. Each of the processing cores contain one or more instruction execution pipelines for executing program code instructions such as a vector instructions such as any of the instructions discussed above. The interconnect network 102 serves to interconnect each of the cores 101_1 to 101_N to each other as well as the other components 103, 104, 105. The last level caching system 103 serves as a last layer of cache in the processor 100 before instructions and/or data are evicted to system memory 108.
The memory controller 104 reads/writes data and instructions from/to system memory 108. The I/O hub 105 manages communication between the processor and “I/O” devices (e.g., non volatile storage devices (such as hard disk drive devices and/or non volatile memory devices) and/or network interfaces). Port 106 stems from the interconnection network 102 to link multiple processors so that systems having more than N cores can be realized. Graphics processing unit (GPU) 107 performs graphics computations. Power Management Unit 109 controls, often in cooperation with power management software, the power dissipation of the processor. Other functional blocks of significance (phase locked loop (PLL) circuitry, etc.) are not depicted in FIG. 1 for convenience.