1. Technical Field
The present invention relates to a compiling method, and more particularly, to a method for providing intrinsic supports for a digital signal processing (DSP) processor with very long instruction word (VLIW) architectures and distributed register files.
2. Background
To handle increasing multimedia workloads, single-instruction-multiple-data (SIMD) computing is generally realized in most modern processors as multimedia extensions. Recently, these multimedia extensions are able to manipulate multiple data in 128-bit or 256-bit vector registers. Similarly, digital signal processing (DSP) processors with very long-instruction-word (VLIW) architectures are often equipped with sub-word instructions to accelerate sub-word data processing. Although the vector widths of VLIW DSP processors, which are usually 32-bit, are relatively short compared to those of general purpose processors, they are sufficient for image and audio/video processing in embedded systems. In addition to sub-word instructions, functional units of VLIW DSP processors can also be utilized to process multiple data streams in parallel. For instance, a five-way issue VLIW DSP processor with two multiplication units can issue up to two multiplications or five normal operations per cycle. This SIMD capability by parallel instruction issuing can be extended by increasing the number of functional units. However, the nature of centralized register files makes it difficult to add unlimited functional units because of increasing silicon areas and power consumption for wire connection between register files and functional units. Therefore, many embedded VLIW DSP processors adopt distributed register files (DRF) to reduce wire connections by clustering functional units and privatizing register files for clusters and even for functional units.
FIG. 1 shows schematic views of a VLIW DSP processor with centralized register files in the left side and a VLIW DSP processor with distributed register files on the right side. As shown in FIG. 1, although the DRF design contributes to scalability, it sacrifices programmability and performance because of the reduced data (or register) accessibility. In addition, with the DRF design, data accessing is a critical concern, where data sharing between functional units may incur communication overhead from one to multiple cycles. Accordingly, it is difficult for compilers to employ functional units for parallel data stream processing. Therefore, there is a need to design a compiler that supports and recognizes intrinsic information in a user-provided program.