1. Field
One or more embodiments relate to a method of managing memory, and more particularly, to a method of managing memory in a multiprocessor system on chip (MPSoC)
2. Description of the Related Art
As higher performance of embedded systems is continuously demanded, integrating more and more processors onto a system on chip (SoC) is unavoidable.
A SoC is a chip that can integrally operate itself, that is, a chip including a system. While a computer includes all hardware components necessary for processing instruction code on a chip, a SoC may include the computer and all other necessary additional electronic parts. For example, a SoC used for communication may include a microprocessor, a digital signal processor (DSP), random access memory (RAM) and read only memory (ROM). In general, a SoC allows a system to be small and an assembling process to be simple. Accordingly, dual or quad processors, DSPs, RAM devices and ROM devices can be integrated onto a single chip.
FIG. 1 is a block diagram illustrating a structure of a conventional multiprocessor system on chip (MPSoC).
Referring to FIG. 1, the MPSoC includes a SoC 100 including four central processing units (CPUs) 110, 120, 130 and 140, a DSP 150, a reconfigurable processor (RP) 160 and a plurality of local static random access memories (SRAMs) 111, 121, 131, 141, 151 and 161 which respectively correspond to the CPUs 110, 120, 130 and 140, the DSP 150 and the RP 160, and a dynamic random access memory (DRAM) 170.
In the MPSoC, access from the local SRAMs 111, 121, 131, 141, 151 and 161 to the DRAM 170, which is the main memory, is a key solving problems such as time delay and power consumption.
FIGS. 2A through 2C are diagrams for illustrating various conventional methods of allocating a scratch pad memory 200 in the MPSoC illustrated in FIG. 1.
Referring to FIGS. 2A through 2C, the scratch pad memory 200, a main memory 210 and a plurality of tasks A through D 220, 230, 240 and 250 are illustrated.
The scratch pad memory 200 is a high-speed SRAM managed by software, for example, an application or a compiler. The scratch pad memory 200 is used in order to optimize access of data and instruction code.
In general, the scratch pad memory 200 is data memory included in an on-chip, and address space of the scratch pad memory 200 is separated from the address space of off-chip memory but the scratch pad memory 200 and the off-chip memory have the same address and are connected to each other by a data bus.
Data stored in the scratch pad memory 200 can be accessed promptly. However, data stored in the off-chip memory requires a relatively long time to be accessed.
The main difference between a conventional cache memory and the scratch pad memory 200 is that the scratch pad memory 200 always guarantees a cycle of access time, while a cache memory cannot easily guarantee a short access time due to cache misses. Thus, time-sensitive data in a real-time system is stored in the scratch pad memory 200. Dataflow of the cache memory is controlled by hardware not by an application and the speed of the dataflow depends on how accurately cache lines are formed.
On the other hand, software is used to read data from or write data to the scratch pad memory 200.
The main memory 210 is off-chip memory such as DRAM or synchronous dynamic random access memory (SDRAM). The main memory 210 is used as sub-memory of SRAM including the scratch pad memory 200 in the MPSoC.
The above-described memory structure is formed because memory close to a CPU has low capacity, is operated at high-speed, and has a high cost while memory far from the CPU has high capacity, is operated at low-speed, and has a low cost.
Furthermore, the access time of the scratch pad memory 200 is ten through a thousand times faster than the access time of the main memory 210. Therefore, the performance of the whole system can be improved by fetching data or instruction code from the scratch pad memory 200.
Accordingly, when the CPU fetches data or instruction code from memory, first, the CPU checks if the data or the instruction code exists in the scratch pad memory 200. If the data or the instruction code exists in the scratch pad memory 200, the CPU fetches the data or the instruction code from the scratch pad memory 200. If not, the CPU has to fetch the data or the instruction code from the main memory 210.
The methods illustrated in FIGS. 2A through 2C involve allocating variables or data of each of the tasks A through D 220, 230, 240 and 250 to physical address space of the scratch pad memory 200.
FIG. 2A is a diagram for illustrating a static allocation method of the scratch pad memory 200. The static allocation is performed by allocating variables of the task A 220 which is statically generated, to the scratch pad memory 200 and allocating wider space for the task B 230 which is statically generated and more frequently used than the task A 220.
However, the above-described method cannot reflect locality, cannot be applied to the task C 240 which is dynamically generated or has to always allocate fixed space to be applied to the task C 240, and cannot be applied to the task D 250 which is dynamically loaded.
Here, the locality is a phenomenon that when a user program is executed, all instructions in the program are not used evenly and some instructions are intensively used. The locality is divided into temporal locality and spatial locality. The task C 240 is executed by a user's selection and the task D 250 is executed by loading source code of the application through a network or the like.
FIG. 2B is a diagram for illustrating a dynamic allocation method based on a compiler. The dynamic allocation involves managing the scratch pad memory 200 by swapping out data from the scratch pad memory 200 to the main memory 210 and by swapping in data from the main memory 210 to the scratch pad memory 200.
Details of the dynamic allocation method are disclosed in “Dynamic Allocation for Scratch-Pad Memory Using Compile-Time Decisions” ACM Trans. Embedded Computing Systems, Vol. 5, No. 2, pp 472-511, May 2006 by S. Udayakumaran et al. However, the above-described method fixes memory space for each task so as not to be able to reflect locality, has to always allocate fixed space to be applied to the dynamically generated task C 240, cannot be applied to the scratch pad memory 200 having a large number of dynamically generated tasks, and cannot be applied to the dynamically loaded task D 250.
FIG. 2C is a diagram for illustrating a split management method of the scratch pad memory 200. The split management method is performed based on code inserted by a compiler by profiling. Here, a memory allocation unit 260 allocates the tasks A through D 220, 230, 240 and 250 to the scratch pad memory 200 in accordance with the size of necessary variables or data and a memory access frequency for each of the tasks A through D 220, 230, 240 and 250.
Detail of the split management method is disclosed in “Shared Scratch-Pad Memory Space Management” IEEE ISQED, 2006 by O. Ozturk et al. However, the above-described method cannot be applied widely to various tasks due to its restriction that accessing by using a loop is only allowed. Furthermore, high overhead is incurred.