1. Field of the Invention
This invention relates generally to the memory access management for a central process unit (CPU) in a reduced instruction set computer (RISC). More particularly, this invention relates to the apparatus and method for implementing a decoding system in the overlapping windowed register file whereby the decoding process can be expeditiously performed with simply structured logic circuits.
2. Description of the Prior Art
The speed of the data retrieval and the complexity of both the circuit and the associated data retrieval software are two inter-related critical design considerations which if not properly managed may often limit a high performance central process unit (CPU) from achieving high data access rate to the memory. The memory of a computer is often organized in a hierarchical manner wherein the `top level memory` is the one which is most directly accessible to the central process unit (CPU). Usually, the data which are most frequently used by the CPU are stored in this `top level memory`.
For the CPU of a reduced instruction set computer (RISC), a structure in the form of register files are often used for the construction of the top level memory because the data can be retrieved at a very high access rate since the register mode instructions for data retrieval are high efficiency data access instructions. FIG. 1 shows the organization of a register file 1 which is partitioned into a plurality of fixed-size, overlapping `windows`, e.g., window A (2) and window B (4), wherein each `window` provide access to the CPU (not shown) when it is `visible`. Not all registers are simultaneously accessible to the CPU at any given time. Generally, only one window is accessible, i.e., visible, and that window is denoted as `current window` (6). The current window 6 is selected by the CPU which makes the selection by generating a window number which is then decoded by a register file decoder 8 to point to the selected window and utilize that window as the current window. The CPU is meanwhile executing a plurality of instructions. A register number 10 is selected by the instructions which again is processed by the decoder 8 to select a register in the current window 6 selected by the CPU.
FIG. 1 shows that some registers belong to two different windows but have different register number in each window. Register r.sub.0 in window A is register r.sub.3 in window B. Such registers are referred to as overlapping registers. Some registers belong to only one window and they are referred to as `local` registers 12. Registers r.sub.1, r.sub.2 and r.sub.3 are local registers 12 in window A and registers r.sub.0, r.sub.1, and r.sub.2 are local registers 12 in window B. In addition, the register file structure for a RISC CPU further comprises a plurality of global registers (not shown in FIG. 1, see FIG. 2) which belongs to all windows and can be accessed at any given time by the CPU. The use of an overlapping window architecture in configuring a RSIC register file has many associated benefits that will become clear from the discussion below. More details are disclosed in `RISC I & II Architecture and Pipeline` in `Reduced Instruction Set Computer Architecture for VLSI` by Manolis G. H. Katevenis, MIT Press 1985.
Each time the CPU execute a procedure call, the window number is updated. Meanwhile, the local registers in that window are allocated by the compiler in advance for that procedure call so that no other window can access them. The extra activities of savings and restoring these registers are therefore not necessary during execution which simplifies the CPU memory management process thus increases the processing speed of the system. On the other hand, the windows are organized in a stack configuration wherein a `parent` procedure writes all the calling arguments in to the overlapping registers which will automatically accessible by a `child` procedure called by the parent procedure. The passing of arguments from the calling procedures to the called procedures are thus streamlined without additional data passing management to keep track of how and when to retrieval data from different memory locations in calling a procedure. The use of the overlapping registers also eliminated the requirement of writing the returning-PC (processing code) to indicate the processing status and the returning values from the child to the parent procedure.
Generally, the registers in the overlapping window architecture are of fixed size which allows a simple and fast AND-OR decoding for converting the selection made by CPU and the instructions to a set of window and register numbers. The special NMOS decoder may be used which is significantly faster than the general OR.sub.-- AND.sub.13 INVERTER decoder.
This type of memory management is a `procedure nesting` scheme. In theory, the depth of this type of procedure nesting can be virtually unbounded, however it is limited by the physical constraints of the CPU. The number of registers and windows in a CPU is typically quite small. The overlapping window register files thus allow only few recent procedure calls to be nested in the top of the nesting stack. Older activation records must be saved in memory. Conceptually, the actual organization of the overlapping window register files is not an infinite stack but rather a circular buffer for the top of the stack only with the data stored in the rest of stack maintained in the memory.
FIG. 2 shows a circular stack buffer comprises register files 20 organized into eight windows, i.e., w.sub.1,w.sub.2, . . . w.sub.8. At any given time, a program can address 32 registers including eight `ins` registers, eight `locals` registers, eight `outs` registers, and eight `global` registers (as is dearly denoted in FIG. 2). The eight `global` registers are addressable from any window. The eight `outs` of one window are also the eight `ins` of the adjacent window. Although an instruction can address twenty-four windowed registers and eight global registers, excluding these global registers, a single window actually comprises sixteen registers, i.e., eight `ins` and eight `locals`. The overlapping nature of the register window can be used to pass information quickly between the overlapping `ins` and `outs` in two adjacent windows for a multi-tasking operation which is often encountered under the working environment of UNIX. There is no need to read and write these common data as they are simply shared by allowing access to the common addressable memory locations.
However, just because there are overlapping registers wherein a single overlapping register can have two register numbers in two different windows, the decoding process to convert the selections made by the CPU and the calling procedure to the actual addresses pointing to a specific window and register becomes more complicated. A two level decoding circuit is required to perform the decode process. In a typical conventional decoding system, a current window and a current register in that current window must first be determined. And since there are overlapping registers, an overlapping register decoding circuit must be used to determine the selected register if the register in a selected window is determined to be an overlapping register. The decoding process is therefore more time consuming and also the two level decoding circuit is more expensive to manufacture which also occupies greater area of the precious `real estate` on an IC chip near the CPU.
More specifically, a conventional addressing scheme as utilized by the current RISC designer can be described as the followings by referring to FIG. 2. Since there are eight windows, a current window is usually represented by a current window pointer (CWP) in the form of CWP(2:0) where CWP may have three bits, i.e., bit zero to bit two, pointing to any one of the eight windows. Similarly, since each window has thirty-two registers, the address of a register is represented by Rs(4:0), where Rs may have five bits, i.e., bit zero to bit four, for pointing to each of the thirty two registers in each window. A total of eight bits are used for pointing to a specific register. In theory, a total of two-hundred and fifty six registers are addressable by this eight bits representation, however, due to the overlapping, these eight-bit is used to address only one hundred and thirty six registers.
The inefficiency of this addressing technique can be appreciated from a simple observation that the conventional addressing scheme has to sequentially process more bits than that may be necessary in the process of identifying a specified window and register selected by the CPU. Valuable resources and the processing time are thus wasted and the system performance is adversely affected due to slow data access rate caused by the inefficiency of this address decoding scheme.
Therefore, a need still exists in the art of RISC system design to improve the decoding algorithm and circuit implementation for the overlapping window register file such that these limitations and inefficiency can be eliminated or reduced.