The present invention relates to a semiconductor device having a processor, in a broad sense, with a memory replacement mechanism.
In a conventional memory-mounted processor, the scaling up of a program mounted thereon leads directly to an increase in the quantity of memories mounted thereon so that the problems of higher cost and a lower operating speed are encountered.
To solve the problems, there has been proposed a structure provided with a memory replacement mechanism in which a low-cost large-scale memory is used to compose a main memory irrespective of the low-speed operation thereof and small-capacity memories are mounted on a processor. In the structure, a program is executed by performing replacement between the small-capacity memories and the main memory. It has been a main stream to use so-called cache memories for such a conventional memory replacement mechanism.
A description will be given herein below to a conventional basic cache memory.
Reference to a memory when a processor gives an instruction or refers to data is localized to a memory region when viewed in a unit time, which is termed the locality of reference of a program. A technology has been known which enables higher-speed memory access by utilizing the locality of reference and thereby causing the region to which frequent memory reference is localized to reside in a buffer memory having a capacity smaller than the main memory and capable of high-speed operation. The smaller-capacity buffer memory is generally termed a cache memory and data transfer in causing frequently referenced data to reside in the cache memory is executed by using hardware.
Referring to FIG. 27, a conventional semiconductor device which is a combination of a processor with a cache memory will be described herein below.
As shown in FIG. 27, the conventional semiconductor device with the cache memory has: a processor 201; a large-capacity main memory 202 which is accessible at a low speed by the processor 201; and a cache memory 204 connected to the main memory 202 with a DMA controller 203 interposed therebetween to be accessed by the processor 201.
The cache memory 204 has been partitioned into lines A, B, C, and D each having a capacity of about several tens of bytes. Each of the lines A to D is provided with a tag 205 for holding information on an address and the like in a one-to-one correspondence. A plurality of instructions or data sets are stored normally in the lines.
The conventional semiconductor device also has a tag comparator 206 for comparing an address to which- the processor 201 has issued an access request with the address held by each of the tags 205 and requesting, if they do not match, the replacement of address data to the DMA controller 203 as the result of comparison.
The following is a brief description of the operation of the conventional semiconductor device thus constructed.
(Step 1)
The processor 201 issues a memory access request.
(Step 2)
A memory address requested by the processor 201 is reported to the tag comparator 206 and the tag comparator 206 examines whether or not the requested address is included in the addresses in the tags 205.
(Step 3)
If the requested address is included in any of the tags 205, the processor 201 proceeds to access the cache memory 204. In this case, the access is established and completed so that Step 4 and the steps subsequent thereto will not be performed. Thus, the state in which the requested address is held in the cache memory 204 is termed a cache hit or simply termed a hit. The cache hit rate influences an efficiency with which a program is processed. If the cache hit rate is high, an average memory access time is reduced so that the processing efficiency is increased.
(Step 4)
If the requested address is not included in any of the tags 205, the tag comparator 206 selects a proper page from the cache memory 204 based on priorities, which will be described later, and generates information on the result of comparison for performing a rewrite operation. The state in which the address is not held is termed a cache mishit or simply a mishit. As an algorithm for determining priorities in comparing the tags, an LRU (Least Recently Used) process has been known commonly. In this process, that one of the lines A to D referenced most previously is selected as a replacement target. Besides, a FIFO (First In First Out) process, a Random process, and the like are known. There is also a process which buries information used as a criterion for determining priorities in the instruction code of the processor 201 and determines the priorities based on the information (see, e.g., Japanese Laid-Open Patent Publication No. HEI 6-59977).
(Step 5)
The information on the result of comparison generated by the tag comparator 206 is reported to the DMA controller 203 and a lineful of data including the requested address is transferred from the main memory 202 to the cache memory 204.
(Step 6)
After the data transfer to the cache memory 204 is completed, the processor 201 accesses the cache memory 204.
However, the foregoing conventional semiconductor device has the problem that it is likely to be increased in circuit scale. In the cache memory 204, the tags 205 for holding information on addresses and the like are provided in correspondence with the lines A to D each having a capacity of several tens of bytes on a one-to-one basis.
Although the number of the tags 205 shown in FIG. 27 is 4, that of the tags 205 provided in a large-scale integrated circuit is on the order of 1000 so that the scale of a peripheral circuit is increased significantly compared with that of an SRAM (Static Random Access Memory). In addition, the tag comparators 206 as many as the tags 206 are constantly operating so that not only an area but also power consumption are increased significantly.
In the case of using the cache memory 204, it is difficult to expect the cache hit rate. If a given program is executed, the cache hit rate varies depending on the content of the program under processing so that the deterioration of the processing efficiency is unpredictable.
These problems are conspicuous in the case where a processor of built-in type is used and a real time property should be guaranteed, in a development process in a field which necessitates lower power consumption such as mobile equipment, and the like.