1. Field
One or more example embodiments of the present disclosure relate to a processor and an operating method of the processor, more particularly, a processor and an operating method of the processor supporting a coarse-grained array mode and a very long instruction word (VLIW) mode.
2. Description of the Related Art
Generally, in consideration of performance and cost, a data memory structure of a processor may be configured to incorporate an L1 memory having a small size and a relatively high speed within the processor, and to cause a memory having a larger size and a relatively low speed to use a source outside of (i.e., external to) the processor, such as a system dynamic random access memory (DRAM), and the like.
FIG. 1 illustrates a configuration of a processor 100 supporting a coarse-grained array mode and a very long instruction word (VLIW) mode according to conventional art.
Referring to FIG. 1, the processor 100 supporting the coarse-grained array mode and the VLIW mode according to the conventional art may include a core 110, a data memory controller 120, and a scratch pad memory 130.
The core 110 of the processor 100 according to the conventional art may have a structure disposing of a number of functional units (FUs) in a grid pattern, and may obtain enhanced performance by easily performing operations in parallel in the FUs through performing the coarse-grained array mode.
The processor 100 according to the conventional art may successively read a value in an input data array among software codes and perform an operation. When a reoccurring routine that is performed using a loop and that is in a form of using a result value in an output data array exists, the reoccurring routine may be processed through the coarse-grained array mode. Accordingly, a data memory access pattern in the coarse-grained array mode may usually correspond to a sequential access pattern. In a case of the sequential access pattern, a temporal/spatial locality may be low. Thus, when a cache memory is used as an L1 data memory, an area used for storage capacity may increase, a miss rate may increase, and a performance may deteriorate.
To enable the coarse-grained array mode to exhibit the best efficiency, the scratch pad memory 130 having a low area cost for unit capacity may be suitable for the data memory structure so that the input and output data array may be relatively large.
However, since the coarse-grained array mode may accelerate only a loop operation portion, a general routine other than the loop operation may be executed in the VLIW mode.
Since the VLIW mode may use only a portion of FUs among a plurality of FUs, performing the operation in parallel may result in poor performance. However, since the VLIW mode may perform a general software code, a function call, and the like in addition to the loop operation, the VLIW mode may be an essential function for the processor to fully execute a single software code.
Since a stack access, a global variable access, and the like may unrestrictedly occur during an execution of code in the VLIW mode, the data memory access pattern may have a relatively high temporal/spatial locality.
To enable the VLIW mode to exhibit the best efficiency, the cache memory, capable of enhancing performance using locality and reducing an external memory bandwidth, may be suitable for an L1 data memory structure.
The processor 100 according to a conventional art may include only the scratch pad memory 130 as the L1 memory. Thus, in the processor 100 according to a conventional art, both of a shared section in which a variable used in the coarse-grained array mode is stored and a local/stack section in which a variable used in the VLIW mode is stored may be included in the scratch pad memory 130. In this instance, the core 110 according to a conventional art may access the scratch pad memory 130 through the data memory controller 120 based on an execution mode to be executed, that is, one of the coarse-grained array mode and the VLIW mode.
Thus, in the processor 100 according to the conventional art, the core 110 may access the scratch pad memory 130 at all times regardless of the execution mode of the core 110. When external accesses simultaneously occur through a bus slave besides the core 110 with respect to the scratch pad memory 130, an execution performance of the scratch pad memory 130 may deteriorate.