1. Field of the Invention
This invention relates to simulation of logic systems. In particular, it relates to the high-speed simulation of logic circuits by a large number of parallel, specialized processing units.
2. Background Art
Logic circuits in computer systems and related products have become increasingly large and complex. As a result, the initial design and fabrication have become increasingly more lengthy and costly. Although many efforts are made to eliminate any errors, it is no longer feasible to test the design only after the circuit has been fabricated. Accordingly, in recent Years there has been increasing effort in design verification using computer modeling of the logic circuits before the circuit is actually embodied in hardware. The errors being referred to here is the interaction of logic circuits which are assumed to be operating correctly as separate entities but which may producing poor or incorrect results in total.
Simulation has become a central part of verification methodology for circuit design. Applications span a wide spectrum, from early specifications to explore different architectural possibilities to the final stages of manufacturing test generation and fault coverage evaluation. For a long time, computer programs for use on a general purpose computer have been known which simulate such logic circuits. However, as the number of gates on a single chip have reached into the range of hundreds of thousands to millions, these purely software simulators have required excessive amounts of computer time. For example, Hitachi has stated in the 23rd (1986) Design Automation Proceedings that 85% of the computer time used in the design of the M-68X high-end mainframe was spent on design verification. See Ohno et al, "Principles of Design Automation System for Very Large Scale Computer Design", pages 354-359. Equally important, the ability to fully simulate a design often has significant impact on product schedule.
One approach used to overcome the excessive resource problem for full system simulation has been to build a hardware model of the design, essentially by hand wiring circuit boards with discrete components. Once wired, the circuit very quickly can emulate the desired circuit. Obviously however, a hardware model itself is costly and time consuming to build.
Another approach, which has found widespread acceptance, is a specialized logic simulation machine, referred to here as a simulation engine. There are numerous hardware machines in existence for simulation, with different capacity, performance and applications. A survey is provided by Blank in a technical article entitled "A Survey of Hardware Accelerators Used in Computer-Aided Design" appearing in IEEE Design and Test of Computers, February 1984, at pages 21-38. At one end, there are work station co-processors, such as the Sprintor from the Zycad Corporation of Minneapolis, Minnesota, which can be a cost effective simulation alternative for relatively small systems, up to a maximum of a few tens of thousands of logic gates, running a small volume of simple test cases. At the other end, there are special main frame attached processors for applications significantly larger than one million gates. Typically these large machines run long, complex architectural level (i.e., assembly language) programs. Examples of such high end processors are SDE from Zycad, the HAL from NEC and the Yorktown Simulation Engine from IBM. The HAL is described by Koike et al in an article entitled "HAL: A High-Speed Logic Simulation Machine" appearing in IEEE Design & Test, October 1985 at pages 61-73. The Yorktown Simulation Engine is described by Denneau in a technical article entitled "The Yorktown Simulation Engine" appearing in ACM IEEE 19th Design Automation Conference Proceedings, 1982, by Kronstadt et al in an article entitled "Software Support for the Yorktown Simulation Engine" appearing in the same proceedings and by Pfister in an article entitled "The Yorktown Simulation Engine Introduction" also appearing at pages 51-64 in the same proceedings. Beece et a described EVE, a production level development of the Yorktown Simulation Engine in a paper entitled "The IBM Engineering Verification Engine" published by the Information Processing Society of Japan on June 10, 1987.
A similar simulation engine is described by Cocke et al in U.S. Pat. No. 4,306,286. The principles of operation of the Yorktown Simulation Engine, which is based on that patent, will now be described. Note the description will be somewhat simplified but should illustrate how the basic simulation paradigm operates.
Consider the logic circuit of FIG. 1. This circuit consists of four NAND gates labelled by their outputs: 1, 2, 3 and 4. There are two inputs to the circuit, labelled 5 and 6, and the one output of the circuit is the output of NAND gate 4. Note that the label of each gate is also used to label its output state or net.
In the Yorktown Simulation Engine, each gate, in this case the four NAND gates, are represented as words in what is called an "instruction memory" (See FIG. 2). The state of every net in the circuit, including the gate outputs and circuit inputs, are represented in what is called a "data memory" (see FIG. 3). Note that there is a one-to-one correspondence between the locations of the data memory and the instruction memory, that is, the first instruction memory location contains NAND gate 1 and the first data memory location contains the output of NAND gate 1, etc. The fields of the instruction memory define the function of each gate, in this case all NANDs, and which nets are used as inputs for the gate, specified by the address in the data memory of the appropriate nets. For example, addresses 5 and 6 in the ADR.sub.1 and ADR.sub.2 fields of the instruction memory at address 1 indicate that nets 5 and 6 are inputs to gate 1.
The operation of the Yorktown Simulation Engine is very simple: Each instruction is executed in order, without branches The evaluation of a single instruction, called a "minor cycle", consists of fetching the specified operands from the instruction memory, performing the indicated logical operand (in this case, always NAND), and then storing the resultant operand back into the data memory. The single evaluation all instructions in sequence, that is, a single pass through the contents of the instruction memory, is called a "major cycle".
In FIG. 3, the different columns show state of the data memory of FIG. 2 after each minor cycle. Values which are undefined (uninitialized) at a given time are shown by a "*". Note that because the gates were rank-ordered, a single pass was sufficient to provide the correct circuit output.
The process just described is well known. What differentiated the Yorktown Simulation Engine was the computational mechanism, illustrated in FIG. 4. This computing system will be called a logic unit. An instruction memory 10 is programmed with the program illustrated in FIG. 2. A data memory 12, corresponding to FIG. 3, is initially loaded with the circuit input values, inputs 5 and 6 in the example. A function evaluator 14 performs the actual evaluation of the instructions. That is, the instructions in the instruction memory 10 are sequentially executed. The operator type is sent to the function evaluator 14. The four operand addresses ADR.sub.1 -ADR.sub.4 are sent to the data memory 12 which then sends the values of the four operands to the function evaluator 14. The function evaluator 14 evaluates the function according to the logic type and sends the result back to the data memory 12, where it is stored at the destination address corresponding to that instruction. In the program of FIG. 2, the destination address is equal to the instruction number. Thereafter, the succeeding instruction is executed in the same way. Because of a limited number of logic types and of operand values, the function evaluator 14 can be implemented by a memory, addressed by the operator type and operand values. That is, the computing system of FIG. 4 is very simple and can be made to operate very fast.
The logic circuit presented as an example in FIG. 1 is unduly simple since it does not allow for feedback in the circuit. For instance, the circuit shown in FIG. 5 has the very simple instruction program shown in FIG. 6. However, because of feedback this program needs to operate a number of times since on the first major cycle some of the gate inputs are undefined. Even on subsequent major cycles some outputs have not assumed stable values. The data memory is shown by the columns of FIG. 7, which unlike FIG. 3 shows the data values only at the end of each major cycle of executing all the instructions.
An important point is that undefined values need to be explicitly represented so that two bits are required to represent a single bit of data. Another point is that an undefined input can sometimes produce a defined output. For instance, in the first major cycle, NAND gate output 1 was produced from a high input and an undefined input. Regardless of the value of the undefined input, the output 1 is still a low.
The simulation for the circuit of FIG. 1 is called a rank-ordered simulation because the ordering of the gates is crucial. The multiple major cycle simulation for the circuit of FIG. 5 is called a unit-delay simulation for the following reasons. The speed of the unit-delay simulation can be increased by the use of an A/B data memory 15, as shown for the logic unit in FIG. 8. Such an A/B data memory 15 has replicated halves. On one major cycle, one of the halves, say the A half, serves as the source of operand values while the B half serves as the destination of the instruction results. On the next major cycle, the roles of the A and B halves are reversed. The advantage of an A/B memory 14 is that the fetching of the next set of operand values does not have to await the storing the instruction result from the previous instruction That is, the operation can be made highly pipelined and the cycle time can be the unit delay of the logic and memory used in the simulation engine.
The power of the simulation engine can be markedly increased if there is replication of the simulation engines shown in FIG. 8. The problem arises, however, that a larger circuit cannot be broken down into smaller circuits which do not interact. That is, there must be provided communication between the logic units to exchange the results of the gate function evaluations. A plurality of parallel logic units 20, each illustrated in FIG. 9, provide such communication across a switch unit illustrated in FIG. 10.
The logic unit 20 is similar to that described before. However, the operand address must be expanded so that it can refer either to locally generated data, generated in the same logic unit 20 and stored in the data memory 14, or to data generated in another logic unit 20 but stored in an A/B input memory 22. The operand addresses must be provided to a four-section data selector 24 so that only four data values are presented to the function evaluator 14. The result of the logic evaluation is stored in the data memory 14 and is also presented to the switch unit on an output data line 25. Simultaneously, the switch unit is presenting data on an input data line 26 to this logic unit 20, which data has originated from some other logic unit. This data is stored in the input memory 22. A switch select memory 28 is stored with the designations of other logic units 20 from which data is to be obtained. The logic unit designation is provided to the switch unit on a switch point line 32. The instruction memory 10, the data receiving parts of the data memory 14 and of the input memory 22 and the switch select memory 28 are synchronously addressed by a counter 30. Therefore, on the n-th minor cycle, the n-th instruction in the instruction memory 10 is executed and the result is presented both to the data memory 14 and to the switch unit via the output data line 25. Then also the n-th location of the data receiving part of the input memory 22 receives data on the data input line 26 via the switch unit from the logic unit 20 designated by the n-th location of the switch select memory 28.
The switch unit 34, illustrated in FIG. 10, is a cross-point switch receiving the data output lines 25 from all the switches in the horizontal direction and selectively connecting them to the data input lines 26 in the vertical direction The switch points in each column are controlled by one of the switch point lines 26. Therefore, one of the data output lines 25 can be connected to more than one of the data input lines 26.
The above description is incomplete and a complete understanding of the circuitry and operation of the simulation engine requires reference to the Cocke et al patent and the Yorktown Simulation Engine papers.
The Cocke et al simulation engine offers a potential increase in speed of N, where N is the number of logic units operating in parallel. Furthermore, because of the A/B memories and statically configured connectivity, the inter-unit communication does not severely affect the speed of the individual units.
However, there are limitations to the Cocke et al simulation engine. The total capacity of the machine is the product of the number of gates simulated per logic unit times the number of logic units in the machine. The performance of the machine is determined by the number of gates simulated per logic unit. Since the interconnection network of the machine is an NxN cross-point, which have limited size, the total number of processing units is limited. Once the maximum sized switch has been built in a given technology 256 or 512 at present, the only way to increase capacity degrades performance. The Cocke et al simulation engine further is limited in the interconnectivity of the circuits simulated by each of the separate logic units. A logic unit can receive only one data value from any other processor for each minor cycle. Therefore, each logic unit can have on average only one operand per instruction which originates from another logic unit. As the number of logic units increases, this condition becomes more difficult to satisfy.
Therefore, although the Cocke et al simulation engine offers greatly increased speed, its fundamental limitations are being pressed by current levels of size and complexity in logic systems.