1. Field of the Invention
This invention relates generally to computer systems that process instructions at a high rate of speed and more particularly relates to computer systems that utilize a combination of an instruction cache memory and consecutive transfer memory, with separate instruction and data paths, to improve the computer system's instruction processing capabilities.
2. Description of the Related Art
The instruction processing capabilities of prior art computer systems were generally limited by the relatively slow speed of their central processing units (CPUs) as compared with the higher speed of traditionally available memories. Today, CPUs, also referred to hereinafter as instruction processors, are typically as fast as, if not faster then, their companion memory systems. Accordingly, the cycle time of the memory system has become the limiting factor with respect to optimal resource utilization.
In an attempt to provide computer architectures which support high speed processing applications, well known computer systems have been developed which employ a relatively small proportion of expensive, very high speed memory, commonly referred to as cache memory, in combination with a larger, lower cost memory that has a slower random access time. Caches that may contain both instructions and data are called "combined caches". They may alternatively be designed to contain only instructions, or data, called respectively an "instruction cache" and a "data cache". The principal purpose of the instruction cache is to provide a vehicle for supplying the processor with instructions faster then they could otherwise be had via accessing the slower main memory over separate memory cycles. Cache memory systems are known that, for example, commonly operate in the 30 ns range; while the slower main memory referred to hereinbefore typically has a cycle time that is on the order of 150 ns, i.e. is approximately five times slower then cache memory. However, cache memory, because of the aforesaid cost factor, may comprise as little as only a fraction of one percent of total memory.
Traditional computer systems have used cache memories to maintain a copy of the most recently used instructions fetched from slower main memory. When instruction fetches are required, the processor first looks in the cache for the instruction. This is accomplished by matching the instruction address with a tag that is stored in the cache alongside the saved instruction. If the comparison succeeds, the instruction is fetched from the cache, and the processor can continue. This is generally referred to as a "cache hit". If there is no copy of the required instruction in the cache (i.e. no tag matches the required address), a "cache miss" is signaled, and the instruction is fetched from main memory. The processor then has to wait for the instruction to be returned. When the instruction is returned, the cache stores it for future use, with a tag to indicate the address of where the instruction came from.
Because the cache is of limited size, when a cache miss occurs, and the instruction is fetched from memory and stored into the cache, some other instruction already in the cache will be overwritten. The decision of which cache location(s) is (are) to be overwritten is generally made by using a replacement algorithm.
The selection of such an algorithm will obviously affect system performance. However, because the same performance tradeoffs apply to both traditional caches and the cache used in accordance with the teachings of the invention (hereinafter disclosed), and because these performance characteristics are not germane to the teaching of the invention per se, they are not discussed further herein. Furthermore, since replacement algorithms are well known to those skilled in the art, including such techniques as overwriting the oldest block of information stored in the cache memory, overwriting the statistically least used data stored in cache memory, etc., likewise, these alogorithms will not be further discussed herein except to the extent they affect managing the disclosed memory system.
Most modern main frame computers employ an instruction cache or combined caches in combination with a slower main memory. Examples include the Digital Equipment Corporation (DEC) VAX, the DEC PDP-11 and the Data General MV 8000.
Other computer systems are known which combine an instruction cache with a slower memory having sequential transfer characteristics. An example of such systems are the Fairchild CLIPPER micro-computer and the Zilog Z80,000.
It has been recognized that, in computer systems having separate instruction and data paths, it is possible to take advantage of memories having sequential transfer characteristics to improve system performance. Such memories can perform sequential transfers much faster than the time required for separate memory cycle accesses. The sequential transfers are initiated by a normal address sent to the memory system, and a read is initiated. The access time for this read is the characteristic nonsequential access time. Again, in standard semiconductor dynamic memories, this access time is of the order of 150 ns. Successive instruction fetches can then occur until either a branching instruction (jump, conditional jump, call, return, or other sequence modifying instruction) is executed, or an interrupt occurs. A new sequence of sequential instructions is then executed. These sequential transfer memories are also synonymously referred to hereinafter as "memories optimized for sequential transfers".
The separate instruction and data paths referred to hereinbefore are necessary to avoid collisions of requests for instructions, and data read/write transfers that would otherwise break the sequential nature of the transfer of instructions.
After the first instruction is fetched, the memory system can be instructed to fetch sequential instructions. The access time for these fetches in standard semiconductor dynamic memory systems is of the order of 50 ns. Such speeds approximate cache memory speeds.
Prior art systems which use cache memory and sequential transfer memory together, although experiencing improved performance with respect to cache plus nonsequential transfer memories, still suffer from not making the most effective possible use of the cache resource. This is because, as indicated hereinbefore, consecutive transfer memory systems have different access times depending on the mode of access, e.g., 150 ns for initial fetches versus 50 ns for sequential fetches. Therefore, to save all new references in the cache memory and continually check for cache hits would be a gross waste of resources when the sequential transfer memory is "up to speed".
It would be desirable to be able to use the cache memory resource in conjunction with a sequential transfer memory in a way that eliminates the processor waiting time associated with the long time (e.g. 150 ns) required to access sequential transfer memory following instruction discontinuities, e.g., following branch instructions. Once the sequential memory is back "up to speed", i.e., operating in the 50 ns range, the cache resource could then conceptually be kept in reserve to potentially speed up processing in the face of subsequent branch instructions.
In addition, it has been recognized, in accordance with the desire expressed hereinabove, that RISC (reduced instruction set computer) architectures may be particularly well suited to benefit from the use of the cache resource in conjunction with a sequential transfer memory. This is because; (1) RISC in-line code has been statistically shown to have relatively fewer branch instructions as compared with code running on other computer systems; and (2) RISC architectures lend themselves to fixed length instruction codes (though not invariably the case) which can easily be translated into the size of an optimal cache memory instruction block, and which in turn can be used to feed the instruction processor while the sequential transfer memory is performing its relatively long, initial access.
The current commercial availability of sequential transfer memory chips and other sequential transfer memory systems, which in certain modes of operation are as effective (in terms of speed) as the cache resource, further suggests the desirability of optimizing the use of the cache resource along the lines expressed hereinabove.