This invention relates to an information processing apparatus having a cache memory for holding a copy of the contents of a main memory and an information processing process.
In a micro-processor, a cache memory of a small capacity and a high speed is placed in the vicinity of a processor to speed up frequently performed memory accessing to reduce the entire run time.
However, if large scale data is to be handled, data transfer between a cache memory and the main memory occurs frequently such that frequently used data may be expelled from the cache memory to lower the performance. Moreover, in the cache memory, since plural data arranged in succession on the main memory are usually grouped into a line and the data are exchanged with the main memory on the line basis, there are occasions wherein unneeded data are introduced into the cache memory. This reduces the effective cache memory capacity to lower the performance additionally.
In actuality, in the field of a supercomputer, the cache memory is not used on accessing large scale data and the data is directly introduced into a register from the main memory. However, in this case, a large number of registers are required in order to hide the comparatively long latency required in data transfer between the main memory and the register.
For example, in a vector supercomputer SX-4, manufactured by NEC, such a technique is used in which a vector register of a large capacity is provided in the processor and, if large scale data is to be accessed by the processor, the data is directly introduced from the main memory to the register and the processor excutes computations on the data in the vector register. Although the cache memory is used for data accessing from a scalar processor, the vector processor does not access data in the cache memory (see M. Inoue, K. Owada, T. Furui and M, Katagiri, Hardware of the SX-4 Series, NEC Technical Journal, Vol. 48, No. 11, pp. 13 to 22, 1995).
In a treatise by Nakamura et al. (H. Nakamura, H. Imori and K. Nakazawa, Evaluation of Pseudo Vector Processor Based on a Register Window, Translation of the Information Processing Society of Japan, Vol. 34, No. 4, pp. 669 to 679, 1993), there is disclosed a technique of efficiently executing computations in the scientific application in a micro-processor. Here, plural registers are provided, and data of the main memory is directly loaded into a separate register set during execution of the computations, and data of a separate register set is directly stored into the main memory. Here again, the cache memory is by-passed in accessing the large scale data, and data is directly exchanged between the register and the main memory.
The above-described techniques, having the function of by-passing the cache memory in accessing the large-scale data to transfer data directly between the main memory and the register, and which are provided with a large number of registers in the processor in order to hide the longer latency required in data transfer between the main memory and the register, for evading the lowering in the performance ascribable to cache memory characteristics, is herein termed a first conventional technique.
In the JP Patent Kokai JP-A-8-286928, there is disclosed a technique of efficiently loading discretely arrayed data in a cache memory. Here, the data arranged discretely on the main memory with a certain regularity is collected by an address conversion device. The collected data are re-arrayed as a sequence of data having consecutive addresses. Since the program has access to the re-arrayed data, the cache memory may hold only data required by the processor, so that it may be expected to realize efficient use of the cache memory and efficient use of the bandwidth between the cache memory and the main memory. This technique is herein termed a second conventional technique.
However, various problems have been encountered in the course of the investigations toward the present invention. For instance, the first conventional technique has an inconvenience that a unit must be provided in the processor to bypass the cache memory and a large number of registers need to be provided for hiding the effect of the memory latency. Therefore, specialized, dedicated microprocessor components are needed, all of which enlarge the circuit.
In the second conventional technique, the following three problems are not resolved.
The first problem is that of cache pollution, i.e., data subsequently to be used is expelled from the cache when data is loaded on a large scale. If the expelled data is again requested by the processor, cache mis-hit occurs, such that the processor must again request data outside the large block of data just loaded. If this occurs frequently, the data transfer between the processor and the main memory device could become excessive and exceed the bandwidth of the channel, thus possibly lowering the performance of the processor system.
The second problem is that, while data copying from the main memory to the cache memory occurs on the cache line basis, this data copying is necessarily started by cache mis-hit, thus increasing the average latency since the time the processor requires data until the data becomes available.
The third problem is that, while a processor core has a low-speed interface on the main memory side and a high-speed interface on the cache memory side, the copying from the main memory to the cache memory occurs via a cache memory controller in the processor core, so that the copying speed is governed by the low-speed interface between the main memory and the processor core.
It is therefore an object of the present invention to provide an information processing apparatus and process in which pre-fetch/post-store between the main memory data and the cache memory is realized efficiently.
Other objects of the present invention will become apparent in the entire disclosure.
An information processing apparatus according to an aspect of the present invent ion has a main memory device, a cache memory for holding a copy of the main memory device, and a processor including a cache memory controller designed to supervise data in the cache memory as the apparatus refers to and updates control information and address information in the cache memory. The information processing apparatus includes pre-fetch unit designed to transfer data in the main memory device to the cache memory without having reference to nor updating the control information and the address information.
The processor has a physical address space area, inclusive of a specified physical space area associated with a specified area on the cache memory in a one-for-one correspondence. The pre-fetch unit directly transfers data between the cache memory and the main memory device, under a command for memory copying for the specified physical space area without obstructing execution of processing by the processor.
The cache memory has a first input/output port and a second input/output port. The first input/output port is connected to the main memory device via the cache memory controller of the processor. The second input/output port is connected to the main memory device via the pre-fetch unit.
The pre-fetch unit copies consecutive or discrete data arrayed at a fixed interval on the main memory device in consecutive areas on the cache memory. The pre-fetch unit copies a sequence of data arrayed in addresses on the main memory device specified by pointers into consecutive areas on the cache memory. The pointers are consecutive or discrete data arrayed at a fixed interval on the main memory device. The pre-fetch unit copies a sequence of data arrayed in addresses on the main memory device specified by pointers into consecutive areas on the cache memory. The pointers are data arranged consecutively in specified areas on the cache memory.
An information processing apparatus according to a second aspect of the present invention has a main memory device, a cache memory holding a copy of the main memory device, and a processor including a cache memory controller designed to supervise data in the cache memory as the apparatus refers to and updates the control information and the address information in the cache memory. The information processing apparatus includes a post-store unit designed to transfer data in the cache memory to the main memory device without having reference to nor updating the control information and the address information.
The processor has a physical address space area, inclusive of a specified physical space area associated with a specified area on the cache memory in a one-for-one correspondence. The post-store unit directly transfers data between the cache memory and the main memory device, under a command for memory copying for the specified physical space area, without obstructing execution of processing by the processor.
The cache memory has a first input/output port and a second input/output port, with the first input/output port being connected to the main memory device via the cache memory controller of the processor and with the second input/output port being connected to the main memory device via the post-store unit.
The post-store unit copies data arrayed in consecutive areas on the cache memory in addresses specified at a fixed interval on the main memory device. The post-store unit copies a sequence of data arrayed in consecutive areas on the cache memory into addresses on the main memory device specified by pointers. The pointers are consecutive or discrete data arrayed at a fixed interval on the main memory device. The post-store unit copies data arrayed in consecutive areas on the cache memory into addresses on the main memory device specified by pointers. The pointers are again data arranged consecutively in specified areas on the cache memory.
According to a third aspect of the present invention, there is provided an information processing apparatus, which includes a processor designed to set a specified block of a plurality of blocks making up a cache memory as a pre-fetch/post-store cache area, to set a specified physical main memory space area of plural physical main memory space areas associated with the pre-fetch/post-store cache area as a pre-fetch/post-store physical space area and to set portions of the plural physical main memory space areas other than the pre-fetch/post-store physical space area as an access-inhibited area to control the pre-fetch/post-store cache area as a cache area dedicated to the pre-fetch/post-store physical space area, and a pre-fetch/post-store circuit designed to re-array a sequence of data existing consecutively or discretely on the main memory device as a sequence of consecutive data on the pre-fetch/post-store cache area directly without interposition of a cache controller of the cache memory. An application program employing the sequence of data existing consecutively or discretely on the main memory device are structured to access the re-arrayed data.
The pre-fetch/post-store circuit re-writes the sequence of data on the pre-fetch/post-store cache area directly in consecutive or discrete sites on the main memory device, without interposition of a cache controller of the cache memory, after an end of use by the application program of the re-arrayed data.
More specifically, the information processing apparatus of the present invention includes a cache memory having plural input/output ports, and a pre-fetch/post-store circuit capable of having direct access only to the cache data information without having reference to nor updating the control/address information in the processor core. A specified physical address area is procured as a pre-fetch/post-store area and the specified physical address area is mapped to the specified area on the cache memory in a one-for-one correspondence under software control. The pre-fetch/post-store circuit causes data to be transferred directly between the main memory and the cache memory under a command from the processor core.
The pre-fetch/post-store circuit has the function of copying data arrayed in consecutive or discrete areas on the main memory into consecutive areas on the cache memory and of copying the data arranged on consecutive areas on the cache memory in consecutive or discrete areas into the main memory. In particular, in the configuration of copying the data arrayed in discrete areas on the main memory into consecutive areas on the cache memory, there is no necessity of loading unneeded data in the cache memory to enable the cache memory to be used efficiently.
According to a further aspect of the present invention, there is provided an information processing process, typically using the aforementioned apparatus.
The process is characterized by comprising pre-fetching data in said main memory device by transferring the data in said main memory device to said cache memory without having reference to nor updating said control information and said address information.
Other features of the process are mentioned in the claims, the entire disclosure relating to the process being herein incorporated by reference thereto.
Also the process aspects are fully disclosed in association with the disclosure relating to the apparatus.
Thus, according to the present invention, the sequence of data arrayed consecutively or discretely on the main memory device for use by the application program can be copied into specified consecutive areas on the cache, without having reference to/updating the information managed by the cache memory controller by the pre-fetch/post-store circuit outside the processor core. Alternatively, the data arranged in specified consecutive areas on the cache can be written in consecutive or discrete areas on the main memory device without having reference to/updating the information managed by the cache memory controller by the pre-fetch/post-store circuit outside the processor core.