1. Field of the Invention
The present invention relates generally to an arrangement of controlling issue timing of an instruction which includes a data retrieval from a RAM (Random Access Memory) provided in a vector processor, and more specifically to such an arrangement via which vector data can rapidly be read out from the RAM as compared with a known technique.
2. Description of Related Art
Vector processing has proven to be an effective approach to speeding up a large number of vectors using pipelined units which perform arithmetic operations on uniform, liner arrays of data values. A vector implies a linear collection of N variables (N is a positive integer) or a data structure that consists of an ordered set of elements. Throughout the instant disclosure, terms "vector" and "vector data" are interchangeably used for the same meaning.
It is a common practice to provide, within a vector processor, a vector memory for temporarily storing vector data to be referred to in subsequent vector processing. The vector memory takes the form of a RAM (Random Access Memory) which is physically, logically divided into a plurality of memory units as discussed later.
Before turning to the present invention it is deemed advantageous to describe a known arrangement with reference to FIGS. 1-4.
Referring to FIG. 1, there are schematically shown a RAM 10, a memory unit selector 12 and a time slot adjuster 14. The RAM 10 is physically divided into four (for example) memory unit MU0-MU3 each of which includes a plurality of vector element storage sections.
As shown, the memory unit MU0 includes vector element memory sections VD(0), VD0(4), VD1(0), . . . , VDn(4), while the memory unit MU1 memory sections VD0(1), VD0(5), VD1(1), . . . , VDn(5). Similarly, the memory unit MU2 includes memory sections VD0(2), VD0(6), VD1(2), . . . , VDn(6) while the memory unit MU3 memory sections VD0(3), VD0(7), VD1(3), . . . , VDn(7).
The vector element memory sections are logically divided into a plurality of memory blocks, viz., VD0(0)-VD0(7), VD1(0)-VD1(7), . . . , VDn(0)-VDn(7) which are sometimes simply depicted by VD0, VD1, . . . , VDn, respectively.
As shown in FIG. 1, each of memory blocks VD0-VDn is shared by the memory units MU0-MU3 and stores an incoming vector. In more specific terms, a vector which has undergone time slot adjustment (if necessary) at the adjuster 14 using a time slot adjust signal, is stored in one of the memory blocks VD0-VDn using write addresses under the control of write enable signals. The write addresses and write enable signals are applied to the memory units MU0-MU3 from a vector processor controller (not shown).
Elements of a vector, which are stored in one of the memory blocks VD0-VDn, are retrieved using read addresses applied to the memory units MU0-MU3. That is, the vector elements are successively derived from the RAM 10 in a predetermined order using the memory unit selector 12 which selects the memory units MU0-MU3 under the control of a memory unit select signal applied thereto.
The arrangement of FIG. 1 is operatively coupled to a crossbar (not shown) from which vectors are applied to the RAM 10. Vectors retrieved from the RAM 10 are also transferred to the crossbar. A crossbar is not directly concerned with the instant disclosure and also is well known in the art and, hence, further descriptions thereof will be omitted.
FIG. 2 shows timing charts which schematically illustrate the write/read operations in connection with the memory block VD0-VD1 (or memory units MU0-MU3).
A first row (A) of FIG. 2 depicts reference time slots ". . . , T1, T2, T3, T4, . . . , T12, T13, . . ." which are used to control overall operations of a vector processor. Time slots in a second row (B) of FIG. 2 are arranged in a manner identical with those in the first row (A) but are illustrated for the convenience of describing the read/write operations of the memory blocks VD0-VD1 (or memory units MU0-MU3) shown in FIG. 1. As shown, time slots in the row (B) are numbered 0, 1, 2 and 3. The reason why the time slots (B) are numbered 0, 1, 2 and 3, is that the RAM 10 is divided into four memory units MU0-MU3 in this particular case. Time slots in a row (C) of FIG. 2 are numbered in the same manner as the reference time slots (B) and are applied to the memory unit MU0. On the other hand, time slots in rows (E), (G) and (I), which are respectively applied to the memory units MU1, MU2 and MU3, are respectively numbered such that the preceding ones thereof (viz., time slots (C), (E) and (G)) are shifted to the right by one time slot.
FIG. 2 illustrates a manner which shows that eight elements of a vector are written into the memory block VD0(0)-VD0(7) using time slots 0. Each of the capital letters W in rows (D), (F), (H) and (J) indicates a write operation of a vector into the memory units MU0-MU3 (FIG. 1). On the other hand, a vector stored in the memory block VD1()-VD1(7) is read out of the memory units MU0-MU3 using time slots 2. Each of the capital letters R within rectangles indicates a read operation of the vector from the memory units MU0-MU3.
In FIG. 2, it is understood that the write and read instructions are implemented in parallel. These instruction can correctly be executed in that the memory blocks VD0 and VD1 respectively accessed by the write and read instructions are different with each other.
However, according to a known technique, if an instruction including a RAM read operation issues immediately after an instruction including a RAM write operation wherein both instructions are directed to the same memory block, the issuance of the instruction including a RAM read operation should be inhibited until the instruction including a RAM write operation is completely finished. The waiting time imposed on the RAM read operation undesirably lessens the overall operation efficiency of a vector processor.
An instruction which includes a RAM write or read operation, may be called as a RAM write or read instruction merely for the convenience of description.
The above mentioned problem inherent in the known technique will further be discussed with reference to FIGS. 3-5.
Reference is made to FIG. 3, wherein a known arrangement for controlling issue of RAM read/write instructions is illustrated in block diagram form.
The arrangement of FIG. 3, denoted by an instruction issue timing controller 29, is interconnected between an instruction controller and a vector processor controller both of which are not directly concerned with the present invention and hence are not shown in the instant disclosure for the sake of brevity.
Throughout the remainings of the instant disclosure, each of the memory blocks VD0-VDn which is designated by a RAM read or write instruction, is sometimes called "entry number".
The arrangement of FIG. 3 is generally provided with an instruction issue indicator 30, an entry number coincidence determiner 32 and a read-out time slot controller 34.
The instruction issue indicator 30 includes an instruction register 36 and an entry number register 38. An instruction, applied from the instruction controller (not shown), is stored in the instruction register 36. On the other hand, the register 38 is arranged to store an entry number which is accompanied by the instruction stored in the register 36.
It is assumed that: (a) two instructions including RAM write and read operations (depicted by first and second instructions) are successively applied to the arrangement of FIG. 3 and (b) the entry number of the first instruction is VD0 (viz., VD0(0)-VD0(7).
The instruction issue indicator 30 further includes, an instruction decoder 40 which decodes the instruction stored in the register 36. If the decoder 40 ascertains that the instruction includes a RAM write operation, it supplies a write instruction controller 42 with a logic 1 (for example). The controller 42 further receives a flag bit from a flag register 44 of the entry number coincidence determiner 32 and also receives an available time slot indicating signal (2 bits) from a time slot flag register 46. Contrarily, if the decoder 40 determines that the instruction stored in the register 36 includes a RAM read operation, the decoder 40 applies a logic 1 to a read instruction controller 47.
In FIG. 3, only one entry number coincidence determiner (denoted by numeral 32) is provided merely for the convenience of simplifying the disclosure. However, in order to effectively achieve multiple accesses to the RAM 10 (FIG. 1), it is a common practice to provide two or more entry number coincidence determiners which are respectively assigned to multiple paths to the RAM 10 and each of which is configured in exactly the same manner as the determiner 32.
The flag register 44 holds a flag bit which changes a logic state from 0 to 1 in the event that the controller 29 generates information which indicates issue timing of a RAM write instruction, as will be referred to later. The flag register 44 retains a logic 1 until the RAM write operation is completed. The flag register 44 initially stores a flag bit assuming a logic 0 and accordingly, the write instruction controller 42 is advised that any other RAM write operation is not presently implemented using the determiner 32.
The time slot flag register 46 includes, in this particular case, four one-bit registers 46a-46d which are respectively assigned to the time slots 0, 1, 2 and 3 and each of which stores a logic 0 if the corresponding time slot is available. In the event that a plurality of time slots are simultaneously available, the time slot with the smallest number is first selected and used. It is assumed that the flag register 46 indicates that all the time slots (viz., 0-3) are free at this time. Thus, the write instruction controller 42 is informed that the time slot 0 should be used.
Subsequently, the write instruction controller 42 issues a control signal A (assuming logic 1) over a line 48. The control signal A is applied to the flag register 44 which, in response to the control signal A, changes the logic state thereof from 0 to 1. Further, the control signal A is applied to an entry number retainer (viz., register) 50 which, in response to the control signal A, stores the entry number held in the register 38.
Still further, the control signal A is applied, via an CR gate 49, to an instruction issue indicator 54, a time slot indicator 56 and an entry number indicator 58 each of which takes the form of a register.
In response to the generation of the control signal A, the three indicators 54, 56 and 58 store the following information. That is, the indicator 54 stores a logic 1 which is applied from the instruction decoder 40 and which indicates the RAM write instruction in this instance. The time slot indicator 56 receives the time slot 0 (viz., the time slot with the smallest number among the available time slots) from the write instruction controller 42 and stores same therein. Further, the entry number indicator 58 stores the entry number VD0 applied from the entry number register 38. When the entry number stored in the register 38 is transferred to the blocks 50 and 58, the entry number register 38 no longer stores the entry number VD0 applied thereto.
Subsequently, the pieces of the information stored in the indicators 54, 56 and 58 are applied to the vector processor controller (not shown). In other words, the vector processor controller coupled to the instruction issue timing controller 29, is informed of issue timing of the above mentioned RAM write instruction. The output of the blocks 54 and 56, depicted by "B" and "C", are applied to a decoder 57 which changes the content of the time slot flag register 46a from "0" to "1".
The above mentioned operations, that the controller 29 (viz., the FIG. 3 arrangement) receives the RAM write instruction and then generates the information from the indicators 54, 56 and 58, are implemented within one time slot.
It is assumed that: (a) a RAM read instruction is applied to the instruction issue timing controller 29 at time slot which follows immediately the time slot wherein the above mentioned RAM write instruction is applied and (b) the entry number is VD1 (viz., VD1(0)-VD(7)).
The RAM read instruction is stored in the instruction register 36, while the entry number VD1 in the entry number register 38. The instruction decoder 40 specifies the RAM read instruction and applies a logic 1 to the read instruction controller 47. A comparator 60 compares the entry number VD1 stored in the register 38 and the entry number VD0 retained in the entry number retainer 50. (The entry number VD0 has been stored during the preceding RAM write instruction and preserved in the register 50.) In this instance, the comparator 60 issues a logic 0 in that the two entry numbers are not identical. Accordingly, an AND gate 62 issues a coincidence signal D which assumes a logic 0 indicative of incoincidence or mismatch and which is applied to the read instruction controller 47.
It is further assumed that, when the RAM read instruction controller 47 receives a logic 1 from the instruction decoder 40, the flag section 46a holds a logic 1 in that the preceding RAM write operation has not yet been completed. Thus, the flag register 46 applies time slot 1 to the read instruction controller 47 and also to a read-out time slot determiner 64. At the present time, time slot 1 is the time slot having the smallest number among the three available slots 1-3. A time slot counter 66 cyclically generates slot numbers 0, 1, 2, and 3 in combination with an adder 68. The output of the counter 66 is applied to the determiner 64 and the read instruction controller 47. If the time slot timing determiner 64 detects coincidence between available time slots applied from the flag 46 and the counter 66, the determiner 64 applies a logic 1 to the read instruction controller 47.
Since the data read entry number VD1 differs from the entry number VD0 which is used by the RAM write instruction, the read operation can correctly be implemented independently of the above mentioned RAM write operation. Accordingly, the read instruction controller 47 issues a logic 1, over a line 51, which is applied to the indicators 54, 56 and 58 via the OR gate 49.
Similar to the aforesaid RAM write instruction, in response to the issuance of a logic 1 from the controller 47, the three indicators 54, 56 and 58 store the following information. That is, the indicator 54 stores a logic 1 which is applied from the instruction decoder 40 and which indicates the RAM read instruction in this instance. It should be noted that the indicator 54 stores a logic 1 which is the same logic state as in the RAM write instruction. However, the vector processor controller (not shown) coupled to the FIG. 3 arrangement is able to determine that the logic 1 stored in the indicator 54 indicates the issue timing of the RAM read instruction. On the other hand, the time slot indicator 56 receives the time slot 1 from the read instruction controller 47 and stores same therein. This means that the time slot 1 will be used for reading a vector out of the entry number VD1 of the RAM memory 10. Further, the entry number indicator 58 stores the entry number VD1 applied from the entry number register 38.
Subsequently, the pieces of the information stored in the indicators 54, 56 and 58 are applied to the vector processor controller (not shown). Thus, the vector stored in the entry number VD1 is retrieved therefrom.
Contrarily, if the above mentioned RAM read instruction is to retrieve the vector which is stored in the same entry number VD0 as utilized by the RAM write instruction, the comparator 60 detects the coincidence between the entry numbers (viz., VD0s) applied from the register 38 and the entry number retainer 50. Accordingly, the comparator 60 generates a logic 1. Further, if the preceding RAM write instruction has not yet been completed, the flag register 44 still assumes a logic 1. Thus, the AND gate 62 supplies the read instruction controller 47 with the coincidence signal D assuming a logic 1. In such a case, the read instruction controller 47 does not generate a logic 1 over the line 51 until receiving a logic 0 from the AND gate 62. In other words, neither of the three indicators 54, 56 nor 58 generate a control signal indicative of issuance of the RAM read instruction until the preceding RAM write instruction is completely executed.
The operations of the instruction issue timing controller 29 are further discussed with reference to FIGS. 4 and 5.
FIG. 4 is timing charts which schematically illustrates successive execution of the following instructions: EQU VADD VD0.rarw.VR0+VR1 EQU VMDA VR2.rarw.VD1
The instruction VADD implies operations that two vectors stored in registers VR0 and VR1 (not shown) are added and then the sum obtained is written into the memory block VD0 (viz., VR0(0)-VR0(7)). On the other hand, the instruction VMDA indicates operations that a vector memorized in the memory block VD1 is read therefrom and then applied to a register VR2 (not shown). The registers VR0, VR1 and VR2 are provided in an external arrangement (not shown).
In FIG. 4, "PPT" is an abbreviation for "Pre-Process Time" which is a time duration from issuance of a RAM read instruction to an actual data read operation from one of the memory blocks VD0-VDn or from the registers such as VR0-VR2. The "PPT" is a constant value determined when designing a vector processor. On the other hand, "FUT" is an abbreviation for "Function Unit Time" which means an execution time period and may assume different values depending on instructions to be executed. It is assumed that the "PPT" and "FUT" respectively correspond to the time periods of 3 and 7 time slots in the instant disclosure.
The information which controls issue of the instruction VADD, is generated from the indicators 54, 56 and 58 (FIG. 3) at reference time slot T2. After three time slots of "PPT", the contents of the registers VR0 and VR1 are read out and then added. Subsequently, after seven time slots of "FUT", the computing result (viz., sum obtained) is written into an appropriate register (not shown in FIG. 3). Timing of generating vector elements of the sum obtained is illustrated in FIG. 4. Thereafter, the sum is written into the memory block VD0 using time slots 0.
In FIG. 4, the instruction VMDA is applied to the controller 29 at reference time slot T4. The memory block VD1 from which vector data is to be retrieved, differs from the memory block VD0 into which the preceding instruction VADD stores the vector data. Therefore, the controller 29 issues the information which controls issue of the instruction VMDA at the same time slot T4 at which the controller 29 receives the instruction VMDA. After three time slots of PPT, a vector stored in the memory block VD1 is read out using time slot 1 (viz., time slot with the smallest number among available slots 1-3). Following this, the elements of the vectors retrieved from VD1 are successively stored into the register VR2 (not shown).
FIG. 5 is timing charts for discussing successive execution of the following instructions: EQU VADD VD0.rarw.VR0+VR1 EQU VMDA VR2.rarw.VD0
The instruction VADD is executed in exactly the same manner as in FIG. 4 and hence this instruction VADD will not be further discussed for the sake of brevity. The other instruction VMDA reads the vector out of the memory block VD0 and then writes same into the register VR2. It should be noted that the two instructions VADD and VMDA utilize the same memory block VD0.
As discussed in connection with FIG. 3, if a RAM read instruction is to b executed on the same memory block as the preceding RAM write instruction, the former instruction is executed after the latter instruction is completely finished. As illustrated in FIG. 5, even if the RAM read instruction is applied to the controller 29 (FIG. 3) at reference time slot T3, the controller 29 is unable to generate the information, which indicates issue timing of the RAM read instruction, until reference time slot T23. The RAM read operation is carried out, from reference time slot T27, using time slots 0 in that time slot 0 is the smallest numbered one among the slots 0-3 which are rendered available when the execution of the instruction VADD is finished.
As discussed above in detail, the known arrangement has encountered the problem in that, if a RAM read instruction is to be executed on the same memory block as the preceding RAM write instruction, the issuance of the RAM read instruction is undesirably delayed until the RAM write instruction is completely executed. Accordingly, it is highly desirable if the RAM read instruction can be issued as soon as possible without waiting for the completion of the RAM write instruction.