1. Field of the Invention
The present invention relates to a data processing apparatus, particularly to a data processing apparatus using a register window method, further to such an apparatus and method alleviating an instruction execution interlocking associated with a register reading.
2. Description of the Related Art
Today, with the arrival of an information society, the amount of data to be processed has dramatically increased. Accordingly, the demand for data processing capability has been increasing, and various proposals have been presented for higher processor speeds (for example, Japanese Patent Application Laid-open No. Shou 63-271553, Japanese Patent Application Laid-open No. Hei 07-253884 and Japanese Patent Application Laid-open No. Hei 07-281897).
In recent years, register window methods have been proposed for the architecture of general purpose registers (for example, Japanese Patent Application Laid-open No. Hei 07-253884 and Japanese Patent Application Laid-open No. Hei 07-281897). This method utilizes a plurality of register sets, i.e., register window (hereinafter called “window”), thereby eliminating a necessity of storing a register in a memory or fetching it from the memory, which occurs with calling or returning from a subroutine. However, this method has had a problem in which the data reading of the register by an execution unit cannot be done at a high speed as the number of windows increases dramatically.
Subsequently proposed has been a method in which the read-write time losses stemming from its size or the like are reduced by retaining a currently referred window in a general purpose register (hereinafter called GPR) as a work register, and having the currently referred window perform as a cache.
However, with architecture retaining one currently referred window as the work register, a data transfer to the work registers has to be executed every time the window is switched. In this instance, because there is no data to be referred to, it is impossible to execute a subsequent instruction until a completion of the data transfer.
This constraint ushers in a very significant performance shortfall especially when processing a large number of simultaneous instructions in a data processing apparatus adopting an out-of-order execution method in which an instruction execution order is changed for processing instructions in order of processability, independent of the program execution sequence.
In a data processing apparatus using the out-of-order execution method, many instructions are stored in a buffer and an executable instruction among them is executed in an order altered from the program sequence, thereby improving an instruction throughput.
However, a restraint as described above makes it impossible to alter the instruction sequence before and after a window switching, and therefore all the subsequent instructions stored therein have to be on hold in the buffer, resulting in the out-of-order execution method being non-functional.
In an attempt to solve the problems as described above, a method has been proposed in which a plurality of windows are retained in the work registers as illustrated by FIG. 1A.
FIG. 1A illustrates architecture of a popular register window. A general purpose register window set (GPR) 100 retains n number of (from 0 to n−1, where n>0), i.e., for n number of windows, each containing one local register, one for each in/out register and one global. In the one local register, a plurality of entries are included and so are both in the in/out register and the global register.
The work register (hereinafter called WR) 102 retains k number (where k>0) of windows, each containing one local register (hereinafter called “local”), one for each in/out register (hereinafter called “in/out”, “in”, or “out”) and one global register (hereinafter called “global”). A data transmission control apparatus 101 controls a type of data to be transmitted and its transmission timing in transmission from the GPR to the WR. In the FIG. 1A, the WR can retain data for one window. By this architecture, an execution unit 103 can read out from the work register 102 and therefore the read-out time can be shortened.
FIG. 1B shows a relationship between a popular GPR and JWR. Referring to FIG. 1B, the operation of a jointed window register (hereinafter called JWR) is now described. The general purpose register window set (GPR) 100 is logically connected in a ring and managed by a current window pointer (hereinafter called CWP). Within the GPR (one GPR corresponds to 8 windows in the example as shown in FIG. 1B), a JWR consists of 3 windows, i.e., CWP−1, CWP and CWP+1. And the JWR further retains the global registers which can be used independent of a CWP, a window-specific local register for each window, and the in- and out registers which allow overlapping between windows. Note that one window consists of a local register, an in/out register. Referring to FIG. 1B, for instance, the in 0, local 0 and out 0 constitute one window; the in 1, local 1 and out 1 constitute another window—skipping several windows in between—and likewise the in 7, local 7 and out 7 constitute a yet further window.
Note that the out 0 and the in 1 share the same segment which is shared between the CWP-1 and the CWP windows. Between the out 1 and in 2, the out 2 and in 3, the out 3 and in 4, the out 4 and in 5, the out 5 and in 6, the out 6 and in 7, and the out 7 and in 0 are also as such. Also note that the local is the register which only the current window can refer to. The global is the register which any switched window can refer to.
Further note that the in, and the out, 0 through 7; the local 0 through 7; and the global (consisting of the global for normal 110, the global for MMU 111, the global for interrupt 112 and the global for alternate 113) each has registers for 8 entries.
In FIG. 1B, the area A shows a JWR area corresponding to the CWP being equal to 1, the area B shows an additional JWR area newly required when incrementing from the CWP being equal to 1. A window switching is performed by incrementing or decrementing the CWP, or rewriting with an optional, discrete value.
When incrementing or decrementing the CWP, three windows are featured corresponding to the CWP after its movement. When incrementing, three windows are featured consisting of CWP (the in 1, local 1 and out 1 in the case of FIG. 1B), CWP+1 (the in 2, local 2 and out 2 in the case of FIG. 1B) and CWP+2 (the in 3, local 3 and out 3 in the case of FIG. 1B), while when decrementing, three windows are featured consisting of CWP−2, CWP−1 and CWP.
As two out of the three windows are already retained in the JWR, only one set of data needs to be transmitted from the GPR to the JWR, i.e., the local, the in/out register, for one window being required. By the data transmission control apparatus 101, the data from the GPR 100 to the JWR 102 are transmitted (For instance, a set of data for area B are transmitted for switching from CWP to CWP+1 in the case of FIG. 1B).
As described above, by retaining the window adjacent to the one currently referred to, as the work register set (JWR), the subsequent instruction for continuous window switching can be executed without waiting for the data transmission.
In this instance, the data reading for the post-switching CWP can be executed without waiting for the data transmission from the GPR, hence without a time loss since CWP−1 and CWP+1 are already retained in the JWR.
Now, when rewriting the CWP with an optional discrete value, the post-switching CWP consists of three windows, i.e., CWP−1, CWP and CWP+1. If the CWP is discrete, there is no guarantee for the JWR retaining the data for the post-switching CWP. For this reason, the local, in/out and global registers required for constituting three windows for the post-switching CWP are all transmitted from the GPR to the JWR (namely load_cwp).
In this instance, as it is impossible to execute a data reading for a post-switching CWP, an incrementing or a decrementing the CWP, because the data cannot be referred to, the execution is started for a renewed JWR after a completion of the data transmission.
Also, if a trap (described below) occurs during a process switching or process execution, requiring a value or an operation which the normal operation does not depend on, or needing to retain the register values at the time of the trap occurrence for restarting the processing, a trap processing is executed by switching the global register 110, i.e., the global for normal, to a trap processing-specific register (while the in/out and local are referred to, for apart the normal operation depends on).
The global registers are disposed for trap processing, i.e., a register (global for MMU (memory management unit)) 111 for processing an error occurring at memory access, a register (global for interrupt) 112 for processing an interrupt commanded by software and a register (global for alternate) 113 for other trap processing.
And, after completing a trap processing, global registers are switched from the trap processing registers to the normal processing registers, and the normal processing is resumed.
Now, a trap is described as follows. A trap in other words is an exception handling in which, if another event occurs during the normal processing, the normal processing is interrupted for solving the event, and then the normal processing will be resumed upon solving it. For example, an error occurs during the normal processing requiring a repetition of the processing or a data correction.
Referring to FIG. 1B, data are written in an initialized GPR by switching windows, a repetition of which makes a certain threshold to be reached, resulting in the window going around one turn and all the registers becoming full. In this instance, a further switching the window destroys the data already written therein. In such a case, the windows will be opened and then reassigned by transferring the written data to the main memory. Such a trap occurrence at window switching is called “window-trap.”
Switching a JWR at a trap occurrence either accompanies a CWP change, or does not accompany a CWP change but only requires switching the global registers. In an event of executing a reset, all the data in the JWR are cleared for initialization, and therefore all the register data including a set of CWPs must be transmitted from the GPR.
Also, as described above, at a trap occurrence stemming from a window operation (window-trap) such as a situation in which the window being transferred thereto has gone around one turn and hence become inoperable, a trap processing must be done for that window, necessitating switching the CWP along with the global registers.
However, at a trap occurrence except for the above described, since a CWP change is not done at the trap processing, what required basically for transmitting to the JWR are only for the global registers. Currently, however, since the whole data, i.e., the local, in/out and the global registers, for three windows have been transmitted through the load-cwp at all the discrete register switching including a trap, extraneous cycles are required for transmitting the essentially unnecessary data at a trap occurrence in need of no CWP change.
Generally, because a normal operation corresponding to an instruction execution after a trap occurrence cannot be guaranteed without a necessary set of data assembled, a subsequent instruction execution must be suspended (hereinafter called “interlocked”) until a completion of the data loading. Such time for suspending (hereinafter called “interlocking”) the instruction execution has been ill affecting the CPU performances.