Any processing element must receive its instructions from some form of code memory. Ideally, when the processing element is ready to execute the next instruction, that instruction would always be immediately available. For computers with relatively slow processing elements whose speeds are matched with those of its main memory, the instructions can be drawn from main memory without the processor waiting an undue amount of time. For systems with faster processing elements, some form of intermediate code memory is needed to match main memory and processor speeds. The problem is further complicated by the requirement of subprogram calls and branch instructions in the code, since these make the location of the next instruction unpredictable. Without subprogram calls and branch instructions, the intermediate code memory between a main memory and a processing element could be a simple first-in, first-out queue. However, because of these instructions, the queue would need to be flushed and reloaded every time the processor encounters one of them, which would not be very satisfactory. The longer the queue, the more time it would take to reload it.
One form of intermediate memory is a cache--such as those defined by Harold Stone in High-performance Computer Architecture (Addison-Wesley, 1987)--which can be used to compensate for the mismatch between the main memory and processor speeds. Caches are fast, small, and relatively expensive memories that are placed between main memory and the processing element. They take advantage of the fact that loops cause instruction execution to be concentrated within a small area of memory, therefore the cache stores this small area and can deliver instructions into the processing element at average speeds approaching that of the cache memory. When a processor requested instruction is located in cache a hit occurs and the instruction is supplied to the processing element within the access time of the cache. Only when there is a miss (i.e. the instruction is not located in cache) does the instruction have to be retrieved from main memory, which has a much longer access time. The larger the cache the more likely it will be that the cache can completely contain a program's loops, and the less likely it is that a miss will occur. However, when a miss does occur the access time is dictated by the access time of main memory. Since cache memories do not anticipate instruction execution, and do not understand subprogram calls and branch instructions, a miss often occurs during subprogram calls or branch operations.
The present invention is designed to cooperate with the invention in the application Rules and Apparatus for a Code RAM that Buffers Prefetched Instruction Sequences by Glenn A. Gibson, filed Jan. 10, 1988, as application Ser. No. 07/144,948 now U.S. Pat. No. 4,876,642 issued 10 Oct. 1989. This invention and the previous invention, referred to as the code buffer, are placed between the processing element and main memory to make instructions immediately available to the processing element a very high percentage of the time. To function properly, the code buffer must receive its instructions inline (i.e., without subprogram calls or returns in the instruction stream) and the present invention satisfies this requirement by maintaining its own last-in/first-out (LIFO) stack of subprogram return addresses. The present invention is also capable of prefetching entire code segments (such as subprograms) from main memory while it is supplying instructions to the processing element through the code buffer. The present invention and the code buffer achieve their effectiveness by having the small, but very fast, code buffer retain all loops of small to moderate length while the present invention supplies inline instructions to the code buffer and prefetches entire code segments from main memory.
The use of a LIFO stack to store return addresses is not new, but normally the stack is put in the processing element or the data memory. The history of using a LIFO stack for this purpose is outlined by Cosgrove, et al, in U.S. Pat. No. 4,399,507. Cosgrove, et al, even placed a copy of the address at the top of the stack in a register in the instruction fetch phase of the processing element's pipeline so that the return instructions could be detected early in the pipeline and have the return address immediately available. This prevented the instruction pipeline from being emptied by return branches. The present invention goes much further by placing the whole stack in the code memory hierarchy and even provides for the subprogram calls and returns to be executed by intermediate code memory logic that monitors the instruction stream.