The present invention relates to an information processing system which comprises a processor for performing arithmetic operation, a memory and a memory controller for performing control over the memory and more particularly, to a prefetch function in an information processing system which uses an embedded processor as a processor.
FIG. 13 shows an arrangement of a general information processing system as a prior art. A processor 1 and a memory controller 2 are connected by a system bus 110, the memory controller 2 and a memory 3 are connected by a memory bus 111, and the memory controller 2 and another system are connected by an IO bus (not shown). The processor 1 of the present system includes an on-chip cache (which will be referred to as the L1 cache, hereinafter) 12, and an L2 cache 14 connected to the system bus 110. The memory controller 2 performs connection control not only over the memory 3 and L2 cache 14 but also over the other system. The operation of the processor 1 of reading an instruction code (which operation will be referred to as fetch, hereinafter) is summarized as follows. The processor 1 issues a memory access request to the memory controller 2 via the instruction processing part 11 and system bus 110. The memory controller 2, in response to the request, reads an instruction code from the L2 cache 14 or memory 3 and transmits it to the processor 1. An access size between the processor 1 and memory 3 is influenced by the L1 cache 12 so that the reading of the code from the memory 3 is carried out on every line size basis as the management unit of the L1 cache 12. Most processors are each equipped usually with, in addition to an L1 cache, an L2 cache provided outside the processor core as a relatively high-speed memory. The word xe2x80x98cachexe2x80x99 as used herein refers to a memory which stores therein an instruction code once accessed by a memory to realize a high-speed access to the same code in the case of an occurrence of the re-access to the same code. In order to perform arithmetic operation, the processor also makes access not only to such an instruction code but also to various sorts of data including operands and to external registers. Even these data is stored in an cache in some cases. Such a technique is already implemented in many systems including a personal computer as a typical example.
In an information processing system, in addition to the arithmetic operation performance of a processor, the reading performance of an instruction code from a memory to the processor is also important. A delay from the access request of the processor to the acceptance of the data thereof is known as an access latency. In these years, the core performance of the processor has been remarkably improved, but an improvement in the supply capability of the instruction code from the access memory is still insufficient. When the access latency becomes unnegligible due to a performance difference between the two, the operation of the processor stalls, which disadvantageously results in that the processor cannot fully exhibit the performances and thus the memory system becomes a bottleneck in the system. Such an access latency problem occurs not only for the instruction fetch but also for data or register operands.
Conventional methods for improving an access latency include first to fourth methods which follow.
The first improvement method is to improve the performance of a system bus. In order to improve the performance of the system bus, it becomes necessary to extend a bus width and improve an operational frequency. However, the improvement is difficult because of the following problems (1) using too many pins of devices to connect the system bus, and (2) a noise problem, such as crosstalk.
The second improvement method is to speed up the memory. For the speed-up of the memory, it is considered to speed up the operation of the memory per se and also to use a cache as the memory. However, such a high-speed memory as a high-speed SRAM or a processor-exclusive memory is expensive, which undesirably involves an increase in the cost of the entire system. Meanwhile the cache has problems based on its principle as follows. That is, the cache is effective after once accessed and is highly useful when repetitively accessed. In particular, a program to be executed on a so-called embedded processor tends to have a low locality of references, the re-use frequency of an instruction code is low and thus the cache memory cannot work effectively. This causes the instruction code to have to be read out directly from the memory, for which reason this method cannot make the most of the high-speed feature of the cache. Further, such a high-speed cache memory used as a high-speed SRAM or a processor-exclusive memory is expensive. Though the price/performance ratio of the memory is improved, the employment of the latest high-speed memory involves high costs. An increasingly large capacity of memory has been demanded by the system in recent years. Thus the cost increase becomes a serious problem.
The third improvement method is considered to employ a so-called harvard architecture of access separation between the instruction code and data. In other words, a bus for exclusive use in the instruction code access and another bus for exclusive use of the data access are provided in the processor. The harvard architecture can be employed for the L1 cache, but the employment thereof for the system bus involves a problem of using many pins of devices to connect the system bus because it requires mounting of 2 channel buses.
The fourth improvement method is considered, prior to issuance of a fetch request of an instruction code from an arithmetic operation part in a processor, to previously read the instruction code (prefetch) from a memory in a memory within the processor. Details of the prefetch is disclosed in U.S. Pat. No. 5,257,359. Disclosed in the publication is that an instruction decoder in the arithmetic operation part decodes and analyzes a required instruction code to thereby predict an instruction code to be next accessed and to previously read the instruction code. In general, the prefetch is effective when the instruction supply ability or rate of the processor is higher than an instruction execution rate thereof. However, since the prefetch within the processor is carried out through the system bus, the system bus creates a bottleneck. Further, since the prefetch within the processor is carried out through the system bus, this prefetch raises a contention with such another external access as an operand access, which disables expectation of its sufficient effect.
The effect of the prefetch generally depends on the characteristics of an instruction code to be executed. The inventor of the present application has paid attention to the fact that an embedding program to be executed on an embedded type processor contains many flows of collectively processing an access to operand data placed on a peripheral register or memory and a comparison judgement and on the basis of its judgement result, selecting the next processing, that is, the program contains lots of syntax xe2x80x9cIFxcx9cTHENxcx9cELSExcx9cxe2x80x9d, for instance, in C language. In the collective processing of operand data access and comparison judgement, the program is processed highly sequentially and tends to have a low locality of references as already mentioned above. In the processing of selecting the next processing based on the judgement result, on the other hand, a branch takes place typically on each processing unit basis of several to several tens of steps. That is, the embedding program is featured in (1) a highly sequential processing property and (2) many branches. In the case of such a program code, the access latency can be reduced by prefetching an instruction code of several to several tens of steps preceding the instruction code currently being executed. However, since the within-processor prefetch of the instruction code of several to several tens of steps ahead as mentioned in the above third improvement method causes the system bus to be occupied by the prefetch memory access, an operand access is forced to wait on the system bus. This disadvantageously leads to the fact that the processor stalls.
It is therefore an object of the present invention to reduce an access latency from the issuance of a memory read request by a processor to a response thereto. Another object of the invention is to prevent reduction of an effective system bus performance caused by an increase in the access latency.
In accordance with an aspect of the present invention, in order to attain the above object, there is provided an information processing system in which a memory controller is connected with the processor via a first bus and connected with a memory via a second bus, and a buffer memory is provided in the memory controller. The control circuit is controlled, before a memory access from the processor is carried out, to estimate an address to be possibly next accessed on the basis of addresses accessed in the past and to prefetch into the buffer memory, data stored in an address area continuous to the address and having a data size of twice or more an access unit of the processor.
In another information processing system, a memory controller is connected with the processor via a first bus and connected with a memory via a second bus, a prefetching buffer memory is provided in the memory controller, the memory and controller are mounted on an identical chip, the operational frequency of the second bus is set to be higher than that of the first bus.
In a further information processing system, a memory controller is connected with the processor via a first bus and connected with a memory via a second bus, a prefetching buffer memory is provided in the memory controller, the memory and controller are mounted on an identical chip, the bus width of the second bus is set to be larger than that of the first bus.
Other means for attaining the above objects as disclosed in the present application will be obvious from the explanation in connection with embodiments which follow.