1. Field of the Invention
The present invention relates to a parallel data path architecture that improves the energy efficiency of a processor.
2. Discussion of Related Art
A data path is a very important block to operations and signal processing and determines the performance of a processor (MPU/MCU/DSP). In general, the data path is a block that executes a series of tasks, that is, processes data and reads and writes the processed data. For example, the data path reads/fetches, decodes, and executes instructions. In this connection, a lot of architectures have been proposed to improve the performance of processors. Above all, a parallel pipeline architecture is being widely employed because it can increase instruction per cycle (IPC) so as to improve the performance of the processors.
The parallel pipeline architecture, which is in common use to improve the performance of a data path of a processor (MPU/MCU/DSP), can be categorized into a single instruction multiple data (SIMD) architecture and a multiple instruction multiple data (MIMD) architecture. The SIMD architecture processes multiple data using a single instruction, whereas the MIMS architecture processes multiple data using multiple instructions. The SIMD architecture can be classified into a superscalar architecture, in which one or more instructions that can be concurrently executed are searched and executed during the operation of a processor, and a very long instruction word (VLIW) architecture, in which one or more instructions that can be concurrently executed are translated into a single long instruction by a compiler and the single long instruction is executed per cycle.
FIG. 1 is a block diagram of a conventional VLIW instruction format, and FIG. 2 is a block diagram of a data path architecture using the VLIW instruction format shown in FIG. 1. As shown in FIGS. 1 and 2, in a processor using the conventional VLIW instruction format, one or more instructions issued from a program memory 10 are compressed into a VLIW instruction, and the VLIW instruction is transmitted to a dispatch unit 12. The dispatch unit 12 extracts the VLIW instruction into individual instructions so that at least two execution units 14 and 16 can execute the individual instructions in parallel. The processor reads data from a data memory 18 or writes data in the data memory 18 according to the executed instructions. In this architecture, because the individual instructions, which are compressed into the VLIW instruction, need to be distinct from each other, the unit of processing instructions becomes complicated.
As described above, in the conventional VLIW architecture, since instructions are intricate and a decoder for decoding the instructions is complicated, hardware is also very complex. Also, in order to execute a very long instruction for a cycle, a highly efficient compiler is positively necessary. Further, power dissipation increases because of hardware function units, which are not sufficiently utilized since it is difficult to make an applied program suitable for the VLIW architecture.
Similar to the conventional VLIW architecture, in a conventional superscalar architecture, as the number of parallel process units increases to improve performance, the number of hardware function units increases. Also, the hardware function units cannot be completely utilized with the application of instruction level parallelism (ILP), and power dissipation increases because of the inefficient hardware function units.
For the above-described reasons, the conventional SIMD superscalar architecture and SIMD VLIW architecture may improve the performance of a processor, but have the problem of very high power dissipation.