The present invention relates to a vector architecture information processing equipment, and more particularly to a vector scatter instruction control circuit.
On a vector architecture information processing equipment, memory area data accessed by a vector instruction is not usually entered in a cache.
The reason is that locality of reference generally does not well applies to data access by a vector instruction so that data accessed by a vector instruction, if entered in a cache memory, is swapped out immediately by other cache line data, and a cache hit ratio decreases.
Also, on a vector architecture information processing equipment, there are provided some vector based memory access instructions, such as VST (vector store)/VLD (vector load) instruction in which a memory access address is defined by a start address and a distance of a vector data to be accessed.
VLD instruction loads data from memory into a vector data storage area made of a plurality of words arranged in a vector unit, called xe2x80x9cvector registerxe2x80x9d in accordance with memory access address defined as described above.
Conversely, VST instruction stores data from a register into memory.
In case of VST instruction, an address accessed by the instruction may be determined on an instruction issue stage. It is relatively easy to accomplish an improvement of performance, by controlling such an instruction as VLD instruction or scalar load instruction that follows VST instruction to be executed ahead of the VST instruction.
On the other hand, with so-called xe2x80x9clist vectorxe2x80x9d instructions, such as VGT (vector gather)/VSC (vector scatter) instructions, data stored in vector registers arranged in the vector unit is used as a memory address to be accessed so that the memory address to be accessed is identified only after the instruction gets to the vector unit, whereas the address is generally random.
For the sake of better understanding of the present invention, a list vector instruction will be described with reference to FIG. 8.
First, as shown in FIG. 8(a), VGT (vector gather) instruction loads data from memory in such a way that a memory data at an address VA (n) of a vector register Vy, is loaded into a corresponding element of the vector register Vx.
As shown in FIG. 8(b), VSC (vector scatter) instruction stores data into memory in such a way that data of the vector register Vx is stored into a memory area of which address VA (n) is stored in a corresponding element of the vector register Vy.
In contrast to vector memory access instructions, with a scalar memory access instruction, locality of reference generally applies to data accesses, as a result of which, such a system is usually adopted in which data accessed by the scalar memory access instruction is stored in a cache memory to make memory access latancy being hidden.
When a vector memory access instruction is issued to write data into memory on a vector architecture information processing equipment accommodated with a cache as described in the above, it is necessary to execute cache invalidation to ensure cache consistency in case that an address to be accessed is being entered in the cache, wherein the cache invalidation process generates a stall of a cache access instruction that follows the vector memory access instruction, which is a one of primary causes of degradation of performance.
A cache invalidation process differs between VST (vector store) instruction and VSC (vector scatter) instruction.
In case of VST instruction, a start address and a distance are determined when the instruction is issued so that with these two data relatively high-speed cache invalidation is realized. Furthermore, since memory access start address and end address of VST instruction can be calculated promptly, a scalar LD (load) instruction that follows VST instruction, may be controlled to be executed ahead of the VST instruction if no address coincidence is detected between these two instructions.
On the other hand, in case of VSC (vector scatter) instruction, since an address to be accessed is determined only after the address is read from a vector register and, in addition, the address value is random, it is necessary to send an invalidation address from a vector unit to a cache invalidation control unit (see 4 in FIG. 1) in a scalar unit to invalidate cache data that matches the invalidation address.
As a result, all memory access instructions that follow VSC instruction cannot be issued until this cache invalidation processing is completed. This causes degradation of performance.
This problem will be described more in detail with reference to FIGS. 6 and 7.
First, in order to make description easy to understand, LDS instruction, which belongs to scalar load (cache access) instructions, will be described with reference to FIG. 7. As with VGT/VSC instruction, LDS instruction comprises four fields: OPC (operation code) and operands X, Y, and Z wherein a memory access address is calculated as Ry+Rz and a resultant data M (Ry+Rz) that is read from memory area of an address Ry+Rz is stored into register Rx.
In FIG. 6(a), after VST (vector) instruction is issued, the cache is invalidated and, almost at the same time, data is written from the vector into memory.
The LDS instruction following the VST instruction may be issued even with the cache being invalidated, unless access address of the LDS instruction overlaps with that of the VST instruction.
On the other hand, referring to FIG. 6(b), with VSC (vector scatter) instruction, cache invalidation is performed when vector processing starts and an invalidating address is sent. In addition, since an address to be accessed immediately after VSC instruction is issued is not known and an address is random, LDS instruction that follows the VSC instruction is kept waiting in a hold state until cache invalidation is completed.
As described above, all memory access instructions that follow the VSC instruction cannot be issued before cache invalidation is completed and this causes performance degradation.
In view of the foregoing, it is an object of the present invention to provide a vector architecture processing equipment that prevents a following instruction from being delayed because of cache invalidation of a vector scatter instruction and that executes the following instruction before the vector scatter instruction to improve performance.
To achieve the above object, in accordance with one aspect of the present invention is provided a circuit comprising:
means for detecting whether an overlap exists between an address to be accessed by an area-specified vector scatter instruction, which specifies a range of memory access address, and an address to be accessed by a memory access instruction that follows the area-specified vector scatter instruction; and
means for holding the memory access instruction that follows on which address coincidence is detected.
In accordance with one aspect of the present invention is provided a circuit for controlling vector scatter instruction wherein an area-specified vector scatter instruction specifying scattered areas is provided as an instruction set, comprising:
an address coincidence detection unit detecting if an address to be accessed by the area-specified vector scatter instruction overlaps with an address to be accessed by a memory access instruction that follows the vector scatter instruction; and
a hold control unit holding the memory access instruction that follows the vector scatter instruction if the addresses overlap.
In accordance with another aspect, is provided a vector architecture information processing equipment comprising:
a vector scatter address coincidence detection unit including:
registers for storing an area start address and an area end address of an area-specified vector scatter instruction in which the area start address and the area end address are specified; and
a circuit for checking if an address to be accessed by a memory access instruction following the area-specified vector scatter instruction is within a scatter area defined by the area start address and the area end address specified by the area-specified vector scatter instruction to outputs an address conflict signal if the address to be accessed by the following memory access instruction is within the scatter area, wherein an instruction issue control unit comprises a hold control circuit for holding said following memory access instruction upon receipt of an address coincidence signal emitted from said vector scatter address coincidence detection unit.
In accordance with another aspect, the present invention provides a vector architecture information processing equipment comprising:
an instruction issue control unit decoding an instruction data to direct an instruction operation;
a cache control unit receiving an address from said instruction issue control unit to control a cache;
a vector unit, on receipt of an execution directive when a vector instruction is issued from said instruction issue control unit sending write vector data to a memory and sending a cache invalidation address, if the vector instruction is an area-specified vector scatter instruction specifying an area start address and an area end address of a scatter area;
a cache invalidation control unit receiving the cache invalidation address from said vector unit to invalidate the cache; and
a vector scatter address conflict detection unit, on receipt of the area start address and the area end address of the area-specified vector scatter instruction from a register block accessed by said instruction issue control unit when the area-specified vector scatter instruction is issued from said instruction issue control unit,
detecting if an area specified by the area start address and the area end address overlaps with an address area to be accessed by a memory access instruction following the area-specified vector scatter instruction to activate an address coincidence signal for sending said signal to said instruction issue control unit if an address overlap is detected,
wherein said instruction issue control unit comprises a hold control circuit that holds the following memory access instruction in response to the activated address conflict signal from said vector scatter address conflict detector.
The hold control circuit preferably does not hold the following memory access instruction if the address coincidence signal from said vector scatter address coincidence detection unit is inactive. The hold control circuit holds the following memory access instruction until a cache invalidation end notification is received from said cache invalidation control unit.
The vector scatter address coincidence detection unit preferably comprises:
a first comparator that compares the address to be accessed by the following memory access instruction with the area start address specified by the area-specified vector scatter instruction and, if the address to be accessed by the following memory access instruction is equal to or larger than the area start address, outputs an active signal;
a second comparator that compares the address to be accessed by the following memory access instruction with the area end address specified by the area-specified vector scatter instruction and, if the address to be accessed by the following memory access instruction is equal to or smaller than the area end address, outputs an active signal; and
a logical circuit that activates the address conflict signal and outputs the signal if both output signals from said first comparator and the output from said second comparator are active.
In an operand of the area-specified vector scatter instruction is included a predetermined field for specifying two registers in which the scatter area start address and the scatter area end address are respectively specified, said two registers being included in said register block.
Still other objects and advantages of the present invention will become readily apparent to those skilled in this art from the following detailed description, wherein only the preferred embodiment of the invention is shown and described, simply by way of illustration of the best mode contemplated of carrying out this invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawing and description are to be regarded as illustrative in nature, and not as restrictive.