In the field of supercomputer development, efforts have been made to develop various types of high speed computers that can deliver extremely high levels of performance. One type of such high speed computers employ a data flow concept and are therefore referred to as data flow computers.
Data flow computers have architectures that are completely different from the architectures of conventional control flow computers. In a data flow computer the sequence in which instructions are executed (i.e., instruction sequencing) is solely based upon the availability of data. In other words, an instruction is enabled or executed when all of its operands are available. This is in contrast to a typical control flow computer in which instructions are executed when "pointed to" by the contents of a centralized program counter. The flow of control travels from one instruction to the "next" instruction as directed by the program counter in the control flow computer. In a data flow computer, however, the flow of control is exactly the path that the data travels.
Another aspect of the data flow computer is that there is no concept of storage or memory for variables in the data flow computer although some form of storage is still provided. The elimination of the storage prevents the use of shared variables. This restriction thus keeps the instruction sequencing tied solely to data dependency and independent of any time ordering.
Various models have been developed for data flow computers. One prior model of data flow computer employed a tagged token data-flow architecture ("TTDA"). In a prior tagged token data flow TTDA computer, data is carried in the form of a token. Each token is tagged with an identifier that indicates its logical ordering. Instruction sequencing is determined by associatively matching data-bearing tokens in a waiting-matching unit. In the prior TTDA computer, a waiting-matching unit (i.e., W-M unit) is provided in each of a plurality of processing elements (i.e., PEs). The W-M unit is an associative memory with a controller. The W-M unit matches tokens with identical tags and stores tokens whose mates have not yet arrived. When an incoming token arrives at a W-M unit of a PE, the tag of the token is associatively compared with all the previously unmatched tokens stored in the W-M unit. If the tag of the incoming token matches the tag of a stored token, the stored token is extracted from the associative memory and the data portions of both matching tokens are delivered to an arithmetic logic unit ("ALU"). The storage location in the associative memory for the matching stored token is then free to store another incoming token. If a match does not occur for the incoming token, the token is then stored in the W-M unit.
The W-M unit typically needs to have relatively large storage capacity. This is due to the fact that a token stored in the W-M unit is removed only when matched to an incoming token. If the W-M unit becomes completely filled with unmatched tokens, the computer will immediately deadlock.
The speed of the W-M unit should typically match the computation speed of the ALU in order not to cause the ALU to wait for matched token pairs. Since the most often executed arithmetic instructions typically require two operands, the speed of the W-M unit is required to be twice of that of the ALU.
FIG. 1 illustrates a prior content addressable and reentrant memory (CARM) cell 10 used in the associative memory of a prior waiting-matching unit of a data flow computer. Typically, the associative memory of the prior W-M unit includes a plurality of such CARM cells arranged in an array. The unmatched tokens are stored initially in an ordered sequence. Once a match occurs, the matched token is fetched out and its memory location is freed to store a next unmatched incoming token. This action destroys the time ordering of the stored tokens. The associative memory is content addressable.
In FIG. 1, transistors 11 through 16 form a random access memory (RAM) cell 25. Transistors 11 and 12 are P-channel transistors and transistors 13-16 are N-channel transistors. Transistors 17 through 20 form a CMOS exclusive NOR gate circuit 27. RAM cell 25 is coupled to a word line 28 and bit lines 24 and 26. Bit lines 24 and 26 are driven in a complementary fashion. Exclusive NOR gate circuit 27 is coupled to a sense line 22 and to RAM cell 25. Exclusive NOR gate circuit 27 is used to perform the bit-wise associative comparison.
During a typical read or write operation, CARM cell 10 operates as a typical RAM cell. Word line 28 is asserted and bit lines 24 and 26 are sensed or driven to read or write RAM cell 25.
During an associative comparison, sense line 22 is charged to the power supply Vdd potential. Bit lines 24 and 26 are driven from a data register (not shown) by a bit of a data as in a write operation. Word line 28 is, however, kept at logical low voltage, thus turning off transistors 15 and 16. This preserves the content stored in cell 25 during the associative comparison. If the content stored in cell 25 does not match the bit information driven on bit lines 24 and 26, either transistors 17 and 19 or transistors 18 and 20 of exclusive NOR gate circuit 27 conduct to discharge sense line 22 to ground. If a match occurs, sense line 22 maintains the Vdd voltage.
When a match occurs for all cells along word line 28 driven by all bits of a data stored in the data register, sense line 22 maintains the Vdd voltage. This indicates a match between the data in the data register and the data word along word line 28. The address of the matching data word is then applied to external circuitry.
Disadvantages are however, associated with the W-M unit of a data flow computer employing the prior CARM cells. One disadvantage associated is that the prior associative memory comprising the CARM cells does not maintain the FIFO logic ordering of the stored tokens upon a successful associative comparison operation. When an incoming token is associatively matched with a stored token in the prior associative memory, the stored token is fetched out for execution with the incoming token and the location is freed to store other incoming tokens that have not found their mates in the associative memory. When an unmatched incoming token is stored in that location, the FIFO ordering of the stored tokens is therefore destroyed.
In order to maintain the FIFO ordering of the remaining stored tokens in the W-M unit, the prior associative memory would need to reorganize the remaining stored tokens after each successful associative matching operation. This would involve the operation of shifting the next succeeding younger token into the freed memory location. The same shifting process repeats until there is no freed memory location between two stored tokens. This takes a large amount of time when the storage capacity of the associative memory is relatively large, which in turn slows the speed performance of the W-M unit.
Another disadvantage of the prior associative memory is its need to output the address of the matched data values, not the data values associated with the matched token. This necessitates another random access memory be used to hold the data values. This typically incurs further delays in receiving the matched data values because of the additional access time of the second memory and of delays caused by device input and output buffers.