The present invention relates to an information processing apparatus, especially, to an information processing apparatus which uses a memory integrated microprocessor and so on.
Generally, a main component which determines a performance of the computer system is a processor and a memory. The improvement of the semiconductor technology is improving the operation frequency of the processor at twice in 2 years. However, the access speed of the memory of DRAM or the like which comprises main memory (called the main memory or merely called a memory) is not faster than that of the processor. Then, a cache memory is used to make up for the difference between this processor speed and the memory speed.
The cache memory utilizes the temporal and the spatial locality of the memory access. That is, the following two facts are utilized (1) the once accessed data has the high possibility to be again accessed and (2) data near the data which was accessed, too, has the high possibility to be accessed.
When the speed difference between the processor and the memory becomes large, the memory access time relatively becomes large. Therefore, the performance of the processor is not improved unlike an improvement of an operation frequency of the processor. To make the performance of the processor high, the memory access time must then be made short. In the processor which loads the cache memory, when there is data to be accessed in cache (hit), the main memory is not accessed. Therefore, only when there is no data in cache (the miss), main memory will be accessed and the average memory access time to the substantial main memory is as follows.
(average memory access time)=(miss rate)xc3x97(refill time)xe2x80x83xe2x80x83(1)
As shown in this formula, to make memory access time short, it is necessary to make miss rate small or the refill time small. The refill time means the time required for the cache refill processing to read data from the main memory in case of the cache miss and restore it into the corresponding cache line.
A technique that involves having a main memory and the processor on the same chip is one technique for shortening the average memory access time. Since a main memory and a processor are loaded onto the same chip, the following advantages are obtained. Since it is unnecessary to access outside of the chip when accessing main memory, it is unnecessary to pass via the input/output buffer. Since the parasitic capacity such as the wiring capacity outside of the chip becomes small, the access time can be shortened. When the memory is allocated out of the processor chip, it is not possible to widen the bit width of the memory which can be accessed at once from the constraint of the number of the pins on the processor chip. In contrast, when the memory is allocated in the same chip, since there is not a such constraint of the number of the pins, the bit width of the memory which can be accessed at once can be widened. Since the bit width which can be accessed at once cannot be widened when the memory is allocated out of the chip, generally, one refill operation is realized by the memory access with several times. One refill operation can be performed by one memory access, because a data size capable of accessing for one access can be widened to a line size of cache when the memory is allocated in the chip. Therefore, the memory access time can be shortened.
The typical line size of the cache memory which is adopted with the present commercial processor is about 16 B (byte) or 32 B. There is a chche memory which utilizes the memory bit width which can be widened as mentioned above, and which proposes a cache memory with the line size 512B. Document [1] (Asheley Saulsbury, Fong Pong and Andreas Nowatzyk, xe2x80x9cMissing the Memory Wall: The Case for Processor/Memory Integration,xe2x80x9d in Proc. International Symposium on Computer architecture, pp. 90-101, May 1996)}
When a line size is enlarged in this way, not only the data of the address to be accessed but also peripheral data with the line size are simultaneously taken with the cache memory, and as a result, a prefetch effect is obtained. Especially in case of the instructin cache, since most of access is a sequential access to access a continued memory area in order to extensively reduce the cache miss rate. In the case of the data cache, however, the effect is not consistent. This is because for one application the miss rate may be extensively improved, but, conversely, for another application the miss rate may become high. enlarged in this way, not only the data of the address to be accessed but also peripheral data with the line size are simultaneously taken with the cache memory, and as a result, a prefetch effect is obtained. Especially in case of the instruction cache, since most of access is a sequential access to access a continued memory area in order to extensively reduce the cache miss rate. In the case of the data cache, however, the effect is not consistent. This is because for one application the miss rate may be extensively improved, but, conversely, for another application the miss rate may become high.
In the processor in which the main memory is loaded onto the chip, the main memory capacity on the chip is constant and can not be increased later. However, it is necessary to be able to increase main memory capacity in the computer system. The way of increasing main memory capacity includes the way of adding a memory chip at an outside of the chip and the way of adding a memory integrated processor. {(Document [1], Document [2] (Murakami et al., xe2x80x9cThe memory-multiprocessor integrated ASSP (Application-Specific Standard Product) architecture : PPRAMxe2x80x9d, IEICE technical report, ICD96-13, April, 1996))}
In the view point from the processor which tries to access the memory, since either when adding a memory chip outside of the chip and adding a memory integrated processor of the accessed memories are allocated at an outside of the chip, it is possible to assume that they are same. Therefore, it assumes that the chip is an external memory chip, even when a memory integrated processor is added. FIG. 1 shows the structure of the computer system in this case.
The memory integrated processor is composed of a processor core 101, an internal memory 102, a bus interface unit 109, an instruction cache 110, and a data cache 111 as shown in the figure. The memory integrated processor is connected to an external I/O unit 108 and an external memory 107 through a bus interface unit 109.
The memory contents of memory 102 in the chip and the external memory 107 which is added at an outside of the chip are temporarily stored in the cache memories 110 and 111, and are accessed from the processor core 101.
In the DRAM integrated processor chip which is described in the document [3] (Toru Shimizu,et al., xe2x80x9cA Multimedia 32 b RISC Microprocessor with 16 Mb DRAMxe2x80x9d in Proc. International on Solid-State Circuits Conference, pp. 216-217, Feb. 1996) and the document [4] (Okumura et al., xe2x80x9cthe 32-bits microprocessor containing 16-Mbits DRAMxe2x80x9d, IEICE technical report, ICD96-7, April, 1996), the line size of the cache memory is 32 B and is equal the line size to that of the processor in which the DRAM is not integrated. That is, this document does not use the fact that a cache line size can be enlarged by containing main memory. Also, this chip has two modes as the way of use of the on-chip cache. One mode is that the on-chip cache is operated as data/instruction cache of the contained memory when using only an contained memory without the external memory. In another mode, when using an external ROM as the instruction memory, the on-chip cache is operated as an instruction cache to an external ROM. That is, one cache in the chip is used as the cache of both of the contained memory and the external memory.
As mentioned above, by adding an external memory or a memory contained processor chip in the conventional system which used a memory contained processor chip the memory capacity of the main memory is extended. It is considered that the data which is transferred from the external memory to which is added is stored in the cache which is used for the internal memory. To utilize an contained memory effectively, the quite large line size must be used. On the other hand, since the physical number of the pins which can be used is limited, it is not possible to widen the bit width when transferring data from the external memory. Therefore, it takes a long time to transfer data with line size of the cache in which the line size is enlarged for the contained memory, and the performance of the processor falls for this purpose. On the other hand, when making a cache line size small according to the bandwidth of the external memory, the large input/output width of the internal memory becomes not able to be sufficiently utilized.
The object of the present invention is to provide an information processing apparatus capable of reducing an average memory access time for both of an internal memory having a large bandwidth and an outside memory having a small bandwidth.
To achieve the above subject matter, an information processing apparatus of the present invention comprises a plurality of memory devices having the different bandwidths and the cache memory storing the data of the memory devices, and can refill data having sizes according to the bandwidths of the memory devices when refilling the data to the cache memory.
That is, the size (the refill size) of the data to be read by one refill according to the bandwidth (the capability to transfer data) of the memory device to each memory device can be determined. When the plurality of caches exist, one cache which can be refilled at the refill size to each memory device is selected, and data from the corresponding memory device is stored to the selected cache.
The procedure of the access to the memory device is as follows. In this case, it is assumed that an accessing address and a data size are given. First, whether or not the given address exists in the cache is examined. If the address exists, the data is read from the cache and transferred to a destination register of the processor core. When the address does not exist in the cache, first, a line in the cache to be stored is determined. When there is possibility that the data of the line does not coincide with the memory device (when the line is rewritten after refill), the data stored in the line is written back to the memory device. Next, the data of the given address is read from the memory device and is refilled. The size of the refilling data is predetermined according to the capability of transfer data of an accessing memory device. Moreover, data of the given address for data having a given data size is transferred to the destination register among the refilled data.
The average memory access time becomes able to be reduced for both of the internal memory having the large bandwidth and the external memory having the small bandwidth, and the processing performance becomes able to be improved by refilling data having a size according to the bandwidth of the memory device to be accessed, during refilling data to the cache memory.
The structure for refilling the data having a size according to the bandwidth of the accessing memory device to the cache memory can be realized by, not only a structure preparing a plurality of cache memories with different cache line size, but also use of a cache memory having a changeable cache line size and so on.
As explained above, data having a size according to the bandwidth of the memory device accessed in refill operation to the cache memory is refilled according to the present invention. Therefore, the refill time to the cache memory becomes short, and the average memory access time is reduced from the external memory to be able to improve a performance of the processor.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.