The present invention relates to a logic simulation machine, which is a special purpose, highly parallel computer for the gate level simulation of logic. The logic simulation machine may operate in combination with a host computer and a local computer which are used to provide loading functions and to analyze the results of the simulation. The logic simulation machine includes a plurality of separate basic processors and a control processor interconnected by a switch.
Logic technologies such as very large scale integrated circuits and Josephson technology provide significant improvements in cost/performance and reliability. However, they have disadvantages in that fault diagnosis thereof is more difficult than previous technologies and engineering rework cycles needed to correct faults in logic design are greatly lengthened. These disadvantages exact great economic penalties for design errors and omissions and place a greater emphasis on the goal of completely verifying design in advance of engineering models.
One technique for providing design verification is simulation; however, this approach has certain disadvantages. It lacks the absoluteness of static verification or any other technique actually proving correctness. The presence of errors, not their absence, is all testing can show, and it is expensive in computer resources and time consuming. Even with high-level software simulation, it is not feasible to run even short hardware diagnostic programs.
However, if the cost of simulation is decreased drastically and the speed and capacity are increased by orders of magnitude, the situation is altered radically. Since an entire processor can be simulated, far more stringent verification is possible through execution of substantial software tests. Also, logic can be tested while embedded in a standard processor design, simplifying test sequence creation and effectively providing personal engineering models. Other advantages also arise. Thus, simulation of faults can be used to derive and verify manufacturing and field tests much more economically.
U.S. Pat. No. 4,306,286, issued Dec. 15, 1981, to Cocke et al. and assigned in common with the present application, describes a logic simulation machine composed of a plurality of parallel processors and a control processor which is capable of simulating a large variety of logic functions. The present invention is an improvement upon the logic simulation machine described in the Cocke et al. patent. As such, the logic simulation machine of Cocke et al. will hereinafter be described in detail.
The logic simulation machine of the Cocke et al. patent is a special purpose, highly parallel computer for the gate level simulation of logic. It provides logic simulation speeds far beyond those of earlier software logic simulators. The embodiment thereof to be described includes thirty-one processors which simulate one gate delay for 31K gates.
Since that logic simulation machine is not a general purpose computer, it must be used as a device attached to a computer which can perform for it functions such as compilation, input/output control, etc. The system in which the logic simulation machine is used may for instance, contain two computers in addition to the logic simulation machine.
The two other computers used in the logic simulation machine may be on an IBM System/370 "host" computer and a local computer connected as an interface between the logic simulation machine and the 370 host computer. The local computer may be IBM Series/1 Model 5 minicomputer. Although two general purpose computers are shown in the following description in alternative embodiments their functions may be performed by one general purpose computer such as the IBM 801. The functions performed by the two general purpose computers are to load the logic simulation machine with data and instructions and to analyze the results that the logic simulation machine has obtained in a manner known in the data processing art.
More particularly, the System/370 host computer provides large computation and file support functions such as user interface control, command parsing, EXEC execution, result display, etc., compilation of logic simulation machine code and input test sequences, file storage and management, and communication with the local computer. The local computer provides fast turn-around functions, such as control of logic simulation machine execution, e.g., single-cycle execution, communication with the host computer, simulation of large storage arrays (control store, main memory, etc.), application of test input sequences, capture of test output results and insertion/removal of logic faults in the fault simulation mode.
Information passed between the logic simulation machine and the host computer is not interpreted by the local computer. The host computer compilation generates information in a form which is directly usable by the logic simulation machine and which can be transmitted through the local computer with no change.
The local computer and the host computer are standard machines and are controlled by programs, therefore, their contribution to the system is conventional. Also, it is possible for the logical simulation machine to have its instructions and data loaded by manual means and its results analyzed by manual means.
Referring to FIG. 1, the logic simulation machine of the Cocke et al. patent is shown in block diagram form. The machine includes a plurality of basic processors, the number of which may vary although thirty-one processors are shown as an example. The thirty-one basic processors are connected to a thirty second processor referred to as a control processor through an inter-processor switch. The plurality (thirty-one) of basic processors are the computing engines of the logic simulation machine; they simulate the individual gates of the design. All the basic processors run in parallel, each simulating a portion of the logic and each basic processor can simulate up to 1024 single output functions. Because the basic processors run in parallel, increasing their number does not decrease the simulation rate, but may, in alternative embodiments, be used to increase it.
There is one control processor (processor 32 in FIG. 1) provided in a logic simulation machine. It provides overall control and input/output facilities. Responding to I/O commands from the Series/1, the control processor performs the functions of starting and stopping the basic processor, loading the basic processor switch instructions and data and transferring input and output data between the basic processors and the local computer, re-ordering the data for simpler processing by the local computer. In addition the control processor interrupts the local computer in response to events occurring during the simulation. Such events include the end of the simulation, requests for array simulation within the local computer, and the occurrence of user-defined break-points.
There is one inter-processor switch 33 in the logic simulation machine. It provides communication among the thirty-one basic processors and between them and the control processor 32. Its primary purpose is to communicate simulated logic signals from the basic processor generating them to the basic processor using them. In addition, it provides communication between the basic processors 1-31 and the control processor 32 for loading the basic processors, transferring inputs and outputs to the local computer, etc.
In the next section of this description the basic processors 1-31, inter-processor switch 33 and control processor 32 of the logic simulation machine are described on a block diagram level with reference to FIG. 1, then a more detailed description is presented with reference to the schematic drawings of FIGS. 2 through 8.
Basic processors (1 through 31 in FIG. 1) are the computing engines of the logic simulation machine: each simulates the individual gates of a portion of the logic. The simulation results are also communicated among the various processors.
The data on which a basic processor operates represent logic signal values. Each datum can represent three values: logical 0, logical 1, and undefined. "Undefined" indicates that the represented signal could be either logical 0 or logical 1. The three values are coded using two bits per datum as follows:
______________________________________ BIT 0 BIT 1 VALUE ______________________________________ 0 0 logical 0 1 0 logical 1 0 1 undefined 1 1 undefined ______________________________________
Either of the two "undefined" combinations may be initially loaded into a basic processor, and a basic processor may produce either as a result during simulation.
Since bit 1 distinguishes the undefined combinations, it is referred to as "the undefined bit." Since bit 0 distinguishes logical 0 from logical 1, it is referred to as "the value bit."
The use of 00 as logical 0 and 10 as logical 1 is a convention; the reverse could be used. However, the use of combinations 01 and 11 to represent undefined values is not a convention; it is built into the basic processor hardware.
The data representation described above is uniformly used throughout the logic simulation machine to represent logic signals.
As illustrated in FIG. 1, each basic processor such as processor 1 has a plurality of internal memories with a logic unit 34 connecting them. Two of these memories are two identical logic data memories which alternately assume one of two roles; that of the current signal value memory 35 and that of the next signal value memory 36. For a clearer explanation of the logic simulation machine, the functions of the logic data memories will be described in terms of these roles.
The current and next signal value memories 35 and 36 contain logic signal representations. Both have 1024 locations, each holding one signal.
The data in current signal value memory 35 are the logic signal values that are currently present in the simulation. The logic unit updates those values, placing the results in the next signal value memory.
The process of updating all the signal values is called a major cycle. The simulation proceeds in units of major cycles, each of which corresponds to a single gate delay. At the conclusion of each major cycle, the logic simulation machine may halt; if it does not, the former next signal value memory is designated to be the current signal value memory (and vice versa) and another major cycle is performed. (It may be noted at this point that the major cycle as used in the Cocke et al. machine is later redefined for purposes of the present invention as a "work cycle", which need not correspond to a fixed time period in simulation. This will be discussed below in more detail.)
Another component of the basic processor of FIG. 1 is the instruction memory 202. The logic unit 34 uses the instruction memory 202 in computing updated logic signal values. The instruction memory has 1024 locations, each containing a single logic simulation machine instruction corresponding to a single 1-output, 5-input gate.
Each logic simulation machine instruction contains a function code field, referred to as the opcode, and five address fields. The function code specifies the logic function to be performed, e.g., AND, NOR, XOR, etc.; this is discussed in more detail hereinafter. The five address fields specify input connections to a gate.
To perform a major cycle, the logic unit 34 sequences through instruction memory 202 in address order, executing each instruction by computing the specified logic function on the five specified values from current signal memory. The result of each instruction is placed in next signal value memory 36 at the address equal to the instruction (representing a gate). For example, the instruction at address X has its result (representing the gate's output) placed at next signal value memory 36 address X; and the gate's output one gate delay earlier resides at current signal value memory 35 address X.
Each execution of an instruction by the logic unit is referred to (somewhat informally) as a minor cycle.
It is important to note that instructions can be executed in any order, i.e., their placement in the instruction memory is arbitrary. This is true because updated values are placed in a separate memory, and there are no branch, test, etc., instructions. This is true because updated values are placed in a separate memory, and there are no sequences for communication between basic processors as will be discussed later.
Instructions have fields other than the function code and 5 addresses. These fields are used to perform "dotted" logic and to simulate gates with more than 5 inputs. When these fields are used, instruction execution order is no longer completely arbitrary. These fields are discussed in later sections.
The operation of a basic processor of FIG. 1 will be described using, as an example, the circuit shown in FIG. 2, which includes four NAND gates.
In FIG. 2, the numbers on the output sides of the gates are the locations in instruction memory of the instructions representing the gate. They are also the locations in current and next signal memory holding the simulated gate outputs. Inputs are assumed to come from locations 5 and 6. (The numbers above the gates represent delay times through the gates. These will be discussed in more detail below in the Description of the Preferred Embodiments. For the present discussion, and as is universally true in the logic simulation machine of Cocke et al., a unit gate delay is assumed for both high-to-low and low-to-high signal transitions.)
The instruction memory contents required for simulation are shown (simplified) in the table of FIG. 3.
Addresses 3 through 5 of each instruction in FIG. 3 are left blank because they are unused in this example; in practice, they might be set to addresses containing constant logical 1's (because the gates are NAND gates).
The table shown in FIG. 4 lists the contents of current signal values undefined (shown as asterisks). The gradual extinction of undefined values shows how logic values propagate through the gates. It should be noted that gate 2 output is fully defined at cycle 2, since a NAND gate with a 0 input has an output of 1 independent of its other inputs.
When a simulation does not require all of the instruction memory locations, the logic unit may execute fewer than the maximum of 1024 instructions per major cycle. This shortens each major cycle, increasing the simulation speed.
The major cycle length is controlled by a minor cycle count register to be described in more detail hereinafter, which contains the address of the last instruction to be executed in each major cycle (plus a "skew" value). There is a single minor cycle count register for the entire logic simulation machine; it controls the major cycle length in every basic processor.
Use of the minor cycle register to control major cycle length permits the feature of increasing the number of basic processors to increase the simulation speed.
The logic functions specified in the logic simulation machine instructions are defined by the contents of another basic processor memory, the function memory 37 shown in FIG. 1. The relation of the function memory 37 to other basic processor elements is illustrated in FIG. 1.
Each distinct logic function used in a basic processor during a simulation is defined by the contents of a single location in function memory 37. The function code (opcode) of each instruction is the address in function memory 37 of the function's definition.
In the initial implementation of the logic simulation machine, the function memory 37 has 1024 locations. Each location contains 64 bits, one for each truth table entry of a 6-input switching function. (The sixth input is used in the simulation of gates with more than five inputs, described in a later section.) The truth table values in the function memory are 0 and 1; "undefined" values are generated by the logic unit in response to "undefined" input values. For example, assume that all the inputs to an AND function are undefined except for one. If that defined input is 1, the output is undefined. If that defined input is 0, the output is defined and equal to 0.
It is to be noted that since each instruction's function code selects an arbitrary location in function memory 37, there is no necessary one-to-one correspondence between instruction memory 202 and function memory 37 locations. Furthermore, there is no requirement that the function memory 37 have the same number of locations as the instruction memory 202 and the signal value memories. There must, however, be a one-to-one correspondence between instruction addresses in the instruction memory and the address into which its result is stored.
Gates of more than five inputs (extended functions) are simulated using facilities internal to a basic processor's logic unit. A diagram of the relevant internal structure appears in FIG. 5.
The function evaluation element of the logic unit computes the result of applying a function (truth table) to logic values. On each minor cycle (instruction execution) the output of function evaluation is stored in the logic unit. There is a sixth logic value input to the function evaluation element. The data presented to this input may be either the previous instruction's result (logic unit accumulator contents) or the contents of an immediate data field in each instruction. The choice is determined by each instruction's immediate select flag: 0 selects the logic accumulator contents, and 1 selects the immediate data field. The small box labelled X in the figure represents this choice of input.
A gate with 5 (or fewer) inputs is represented by a single instruction with an immediate select flag of 1. The function definition used must either ignore the constant sixth input or allow it to be some value that will not affect the result when only 5 inputs are used. That is, an immediate logical 0 allows a 6-input OR function definition to be used to simulate 5-input OR gates.
A gate with more than 5 inputs must be represented by two or more successive instructions. The second through last instructions all use the preceding instruction's results (logic unit accumulator contents) as their sixth input. Prior instruction results are stored in the next signal value memory, but do not correspond to elements of the simulated machine.
For example, suppose a 15-input NOR gate is to be simulated. Assuming its inputs come from locations 101 through 115, an appropriate instruction sequence is shown in the table of FIG. 6.
The fist instruction in the table selects an immediate logical 0 as its sixth input. The other two use the previous instructions' output as the sixth input, so their immediate fields are irrelevant (indicated by X's in the table). The functions shown (two ORs followed by a NOR) cause the last instruction's output to be the NOR of all 15 inputs.
It is to be noted that no other instructions may intervene in a sequence of instructions computing a function of more than five inputs in this manner, since they would destroy the logic accumulator contents. This method of simulating extended functions corresponds to a functional decompostion that is easy to perform for the most common logic primitives: AND, OR, NAND, NOR, EXOR, etc. For those primitives, the decomposition needed follows directly from the associativity of AND, OR and EXOR, e.g., the decomposition used in the fifteen input NOR example above was: EQU NOR(A,B,C,D,E,F, . . . )=NOR(OR(A,B,C,D,E,),F, . . .).
For more general functions, the needed decomposition is more difficult to find, if it exists at all. Simulation of such more general functions can be done in a straight-forward fashion by use of the logic simulation machine facilities for "dotted" logic presented in the next section.
"Dotted" (or "wired," or "wire-tied") logic, performed in hardware by directly connecting gate outputs, can be simulated by use of logic unit elements labelled Dotting Logic and Dot Accumulator in FIG. 7. These elements are controlled by three flags in each instruction: the SAVE FOR DOT flag, the DOT SELECT flag, and the DOT FUNCTION flag.
When SAVE FOR DOT is 1, the output of the logic unit for the instruction is stored in the Dot Accumulator. Otherwise, the instruction does not modify the Dot Accumulator.
When DOT SELECT is 1, the output of the logic unit, which is the value stored in next signal value memory, is a function (AND or OR) of the current Dot Accumulator contents and the output of the current instruction (logic unit accumulator contents). This final value may be saved in the Dot Accumulator by using the SAVE FOR DOT flag.
DOT FUNCTION defines the "dotted logic" function performed: DOT FUNCTION=0 selects AND, and DOT FUNCTION=1 selects OR (assuming the convention that 00 is logical 0 and 10 is logical 1; the opposite convention reverses the DOT FUNCTION meanings). DOT FUNCTION is active only when DOT SELECT is 1.
As an example, reference is made to FIG. 8, which shows a 3-way collector dot ("dotted OR"). The numbers near the gates are instruction memory addresses for instructions representing the gates; the numbers at the inputs are addresses of the input data, and the number at the output is the address where the final dotted result is placed in next signal value memory.
The table of FIG. 9 shows instructions implementing the simulation of dotting for the example. Unused inputs have been left blank for clarity, and the immediate select and immediate value fields are left out also since they are not relevant to the example.
The first instruction in the table just saves its result. Its DOT FUNCTION flag is immaterial (indicated by an X) since its DOT SELECT flag is 0. The second instruction ANDs its result with the saved first instruction's result and saves that. (Note that AND is used since the common term "wired OR" actually refers to the opposite logic convention.) The third instruction also ANDs its result with the saved dotting result; its output is the final dotted logic result, so it is not saved for further dotting.
Note that several instructions can intervene between two whose outputs are to be "dotted" together, providing they do not alter the dot accumulator. This allows the simulation of "wired logic" between gates of more than five inputs.
The dotted logic facilities of the logic simulation machine can also be used in simulating gates with more than 5 inputs. This is particularly useful for simulating gates implementing complex functions since a decomposition into product-of-sums or sum-of-products form can be the basis of the representation used. Individual instructions perform the first level of the decomposition (the several sums or products), and the dotting logic is used to perform the second level (the single outer level product or sum).
It was previously stated that the order of instructions in the instruction memory was not relevant. This is clearly not true for sequences of instructions using the logic unit accumulator and dot accumulator. However, such sequences are typically needed only to simulate a small minority of a device's logic, and a sequence as a whole can be arbitrarily postioned in instruction memory.
The primary function of the inter-processor switch 33 of FIG. 1 is to communicate instruction results from the basic processors, delivering them to basic processors using them. That function will now be described.
The inter-processor switch 33 connects all the basic processors 1-31 and the control processor 32. This is illustrated in FIG. 1.
The communication of results between processors makes use of additional memories within each basic processor as shown in FIG. 1. The function of these memories in providing inter-processor communication through the switch is described below.
As illustrated in FIG. 1, each basic processor has two additional internal logic data memories. These are identical to the current and next signal value memories 35 and 36 previously discussed. Like the signal value memories 35 and 36, these additional memories alternately assume one of two roles, that of the current signal input memory 38 and that of the next signal input memory 39. Their functions will be described in terms of these roles. The actual data memories have been called the A-IN memory and the B-IN memory.
Like the signal value memories 35 and 36, the current and next signal input memories 38 and 39 contain representations of logic signals. Both have 1024 locations, each holding one signal.
The data in current signal input memory 38 are logic signal values that are currently present in the simulation, and were generated by other basic processors. This data is selected by use of ADDRESS SOURCE flags in each basic processor instruction. Each of the five addresses in an instruction has an associated ADDRESS SOURCE flag. When it is 0, the address references data in current signal value memory; when 1, the address references data in current signal input memory. Thus any or all of the data used in computing a logic function can come from other processors.
In the course of a major cycle, updated values are obtained from the inter-processor switch 33 and placed in the next signal input memory 39, and at the end of each major cycle, the former next signal input memory is designated to be the current signal input memory (and vice versa).
The switch select memory 40 of each basic processor has 1024 locations, each containing the address of a basic processor. The inter-processor switch 33 uses the switch select memory 40 to place updated logic signal values in the next signal value memory 36 as follows. The result of each instruction, the value stored in a basic processor are synchronized: every basic processor executes its Kth instruction at the same time. Thus, all processor's results are sent to the inter-processor switch 33 simultaneously, in the order of their addresses in next and current signal value memories 36 and 35.
The switch select memory 40 and next signal input memory 39 are also stepped through in address order, synchronized with instruction execution. At each minor cycle, the switch sends to each basic processor the current output of the basic processor address by the current switch select memory 40 location. This output is placed in the current location in next signal input memory 39. Thus, if a basic processor's switch select memory 40 has Q in location Z, it receives the Zth output of basic processor Q; this is placed in location Z of its next signal input memory 39.
The tables shown in FIGS. 11 and 12 show the switch select memory 40 and instruction memory 202 contents providing the required communication for the circuit shown in FIG. 10, assuming the allocation to processors shown in FIG. 12. The numbers in FIG. 12 are instruction/data locations corresponding to the gates and signal lines shown. Unused elements have been left out of the instructions for clarity.
Since processor 3 needs processor 2's 49th output, the table of FIG. 11 shows processor 3's 18th instruction accessing that location with FIG. 11 showing a 2 in processor 3's 49th switch select memory location. This places the needed value in processor 3's 49th next signal input memory 39, so the table of FIG. 11 shows processor 3's 18th instruction accessing that location with its second address. The other table elements are derived simularly. It should be noted that at minor cycle 18, processor 1 simultaneously transmits and receives, sending data to processors 2 and 3 and obtaining it from processor 3.
Suppose a basic processor needs data generated in two other processors, and they generate it at the same minor cycle (same instruction location). The needed communication cannot be performed, since a basic processor can receive the output of only one other processor at each minor cycle.
However, instruction execution order is arbitrary (except for extended functions and "dotted" logic), so instructions can be ordered in instruction memory 202 to avoid such conflicts. The problem of discovering such an ordering is called the scheduling problem. The scheduling problem must be solved by the logic simulation machine compiler for each device to be simulated. Just as physical components, must be placed and wired, simulated logic must be partitioned among the processors and scheduled.
Partitioning and scheduling is readily achieved in the logic simulation. Communication can be scheduled even when extremely simple partitioning is used, such as, placing the first N gates in processor 1, the next N in processor 2, etc. Even examples containing substantial use of the logic accumulator may be successfully scheduled using this simple partitioning.
The control processor 32 of the logic simulation machine provides two functions of particular interest, they are organizing signal values into functional groups so that they can read from or written into the logic simulation machine with a single local computer (Series/1) input/output operation, and halting the logic simulation machine and interrupting the local computer when any of a group of selected signals are set to specified values.
The first function provides for efficient application of input sequences and gathering of data by the local computer. The halting function is the basic mechanism for informing the local computer of events in the simulation such as user-defined events or requests for array read or write. The control processor including these two functions is discussed as follows.
The control processor 32 contains two counters which are of use in controlling overall logic simulation machine execution.
The control processor also provides general logic simulation machine control functions, such as starting the logic simulation machine, halting it, etc. These are provided via commands from the local computer which utilize the control processor in a transparent manner.
The control processor contains six memories referred to as a switch select memory, an output signal memory, an input permutation memory, and an event mask.
The characteristics and functions of each of these memories are described below.
The switch select memory of the control processor and its connection to the inter-processor switch 33 are identical in configuration and operation to switch select memory 40 of a basic processor switch 33 from the control processor 32 each minor cycle; and the control processor's switch select memory determines the basic processor from which it receives data each minor cycle.
The input and output signal memories serve as a sink and a source, respectively, of logic data communicated between the control processor 32 and the basic processors 1-31 via the inter-processor switch 33. Both have 1024 locations, each holding a single signal.
In contrast to other signal data memories in the logic simulation machine, no internal logic simulation machine action ever reads the contents of the input signal memory; it is read only into the local computer main storage via its input/output operations. Similarly, no internal action ever alters the contents of the output signal memory; it is loaded only from the local computer main storage via its input/output operations. In addition, no swapping of these memories occurs between major cycles.
The function of the input and output permutation memories is to permute the transmission order of values in the input and output signal memories respectively. Each of the memories contains 1024 locations of 10 bits each; an address in the associated signal memory is contained in each location.
Every major cycle, the input permutation memory is cycled through in address order, synchronized with basic processor instruction execution. The address in the current input permutation memory location is used as the address in input signal memory where data currently received from the inter-processor switch is placed. The output permutation memory is cycled through in the same manner. The contents of each location is used as the address in the output signal memory from which data is sent to the inter-processor switch 33.
This permutation of signal order allows data to be functionally grouped in the output signal memory, minimizing the local computer input/output operations needed to alter it. Such functional grouping might otherwise be impossible to achieve, due to the requirements of scheduling inter-processor data transfers through the inter-processor switch 33. For example, a set of test inputs can be positioned in contiguous locations of the output signal memory and stored there with a single local computer input/output operation; appropriate output permutation memory contents can then feed that data to the switch at the minor cycles to which they have been scheduled for conflict-free inter-processor communication.
The event mask in control processor 32 allows the logic simulation machine to halt in response to events in the simulation itself, i.e., the setting of some simulated signal(s) to selected value(s).
The event mask contains 1024 locations of 4 bits each. Each of the bits corresponds to an individual value of the 2-bit code for simulated signal values: the first bit corresponds to 00, the second to 01, the third to 10, and the last to 11.
The event mask, in parallel with the output and input permutation memories of control processor 32, is cycled through in address order as part of a major cycle. As each signal value is received from the inter-processor switch 33, it is matched against the contents of the current event mask location. If that location contains a "1" corresponding to the signal value, the simulator is halted at the end of the current major cycle and an interrupt is presented to the local computer.
It should be noted that event mask locations correspond to signal values in the order they are received from the switch, not in the (permuted) order of storage in input signal memory.
Since the event mask can have more than a single "1" in each location, any of several values of a particular signal can be made to cause the simulator to halt. For example, the simulator could be halted when a particular signal is set to anything except logical 0.
Since the logic simulation machine only halts at the end of a major cycle, thus ensuring consistency of all simulated signal values, more than one signal value can match the event mask settings before the halt and interrupt, effectively causing a halt for several reasons simultaneously. For this reason, the local computer is given no direct indication as to which signal value caused the halt. Instead, controlling software in the local computer must read the contents of the input signal memory to determine which signal value(s) caused the halt. This implies another use of the output permutation memory: all the simulated signal values which could cause a halt can be grouped in input signal memory and thus read by a single local computer input/output operation.
The control processor also contains two identical counters called level counter 1 and level counter 2, each 16 bits long. They can be loaded via local computer input/output commands, and are decremented each major cycle. When either reaches 0, the local computer is interrupted.
These counters can be used for various purposes. For example, one can count the major cycle (gate delays) per logic cycle of the simulated device, giving the Series/1 and an interrupt when it is time to gather an output vector and apply a new input vector. The other can count the total number of major cycles a simulation is to run.
A more detailed description of the logic simulation machine shown in block diagram form in FIG. 1 will now be provided. FIGS. 13A, 13B, 13C and 13D illustrate the entire logic simulation machine and FIG. 13E is an illustration of waveforms used in the description of FIGS. 13A through 13D.
Referring to FIGS. 13A and 13B, and as previously stated, there are thirty-one processors numbered from number 1 to number 31 as also shown in FIG. 1, processor number 31 is shown in diagrammatic form on FIGS. 13A and 13B.
Also as previously stated, the actual number of the just mentioned processors is not important. There could be more processors or less. The number of thirty-one processors is chosen as one example of a practical machine embodiment of the invention. In addition to the processors number 1 through 31, there is control processor 32 also shown in FIG. 1, which is shown on FIGS. 13C and 13B. Control processor 32 will be described in detail later, and as previously stated is used mainly to accumulate the results and to provide the control pulses which are needed for the entire logic simulation machine of this embodiment.
In FIG. 13A, the address counter 200, is used to supply addresses to the instruction memory 202. In memory 202 has provisions for 1024 words. This number is only used as an example, and a greater or lesser number of words may be used. Actually, in the operation of the described machine, the total number of 1024 words in the instruction memory 202 may not be used as will be understood in the later description.
From the instruction memory 202, words are read into the instruction register 204. Because of the pipelined structure of the machine, a second instruction register 206 is needed into which the same word is read by the P-2 gate pulse also illustrated in FIG. 13E. The register 206 then acts as an input register to the "logic unit memory" 208 and to the memories labelled A, B, A IN and B IN. The left-hand section of register 206 holds five operand addresses. The middle section of this register holds some control bits and the right-hand portion holds the operation code. The operation code in the right-hand portion of register 206 acts as an address to the logic unit memory to place a word in the "logic unit memory register" section of register 210. The control section is passed from register 205 to register 210 by the P-1 gate pulse which is also shown in FIG. 13E. The A and B memories and their sections A IN and B IN are special memories and will be described more in detail later. At this point in the description, it can be mentioned that either the A and A IN or the B and B IN memories can be read and placed in the "logic unit input register" of register 210, As previously described, and as will be explained in more detail later, there are "minor cycles" and "major cycles" in the logic simulation machine operation. During a "minor cycle" one instruction in the instruction memory 202 on FIG. 13A is read and processed. A "major cycle" can consist of 1024 of these "minor cycles". The A, B, A IN and B IN memories are used alternately. For example, during one major cycle the A and A IN memories may be read from, while in a next successive major cycle the B and B IN memories are read from.
In the next "major cycle" the B and B IN memories will be read. The switching between the A and A IN and the B and B IN memories is accomplished by the switching mechanism indicated by the dotted line 212 on FIG. 13A. If it be assumed that, in a certain major cycle, the A and A IN memories are read out of in this same major cycle, the B and B IN memories can be read into. When the A and A IN memories are read out of, these memories are regarded as a single memory. A and A IN memories are both memories addressed by the low order ten bits of the operand in register 206 while the high order bit of this same operand selects which of two (A or A IN) memories is actually read. In other words, the operands in register 206 are eleven bits. When the A, A IN, B or B IN memories are written into, each one of these memories are regarded as separate memories and are both addressed by ten bits which is contained in the address register 214 on FIG. 13A. These memories will be described in much more detail later on in the description.
Referring again to FIG. 13A, information in register 210 is transferred to register 216 by the P-2 gate pulse. This permits the information in register 216 to be applied to the LOGIC UNIT FIRST STEP. This "logic unit first step" 218 will be described in more detail later. The output of the "logic unit first step" 218 and also the control bits numbers "11", "12" and "13" are transferred to 220 on FIG. 13B by the P-1 gate pulse. From register 220 these data are gated to register 222 by the gate pulse P-2. Register 222 serves as the input register for the "logic unit second step" 224, The "logic unit second step" 224 will be described in more detail later. From this unit, the results are gated to the logic unit output register 226 by the pulse P-1.
From register 226, information is gated to register 228 by the P-2 gate pulse. These delay registers are necessary in the machine because of the "pipelining" used in the design. From register 228 it can be seen that data has two paths. One is back to the processor via cable 230 so that information can be written into either the A or B memories according to the setting of the switches represented by reference character 212. The other path for data is via cable 232 which goes to the inter-processor switch. Depending on the setting of the gates in the inter-processor switch 33 shown in FIG. 1, information can be gated from any processor back to itself or to any other processor in the group from processor number 1 through processor number 32. When this information comes in from the inter-processor switch 33 it always goes into either the A IN or the B IN memories. The inter-processor switch will be described in more detail later.
Reference should next be made to FIGS. 13C and 13D for the description of the control processor 32. Control processor 32 permits and accumulates information produced by the other thirty-one processors, keeps track of "events" and contains all the pulse generation equipment used to control the whole machine of this embodiment. On FIG. 13C, the "address counter" 234 is used to address the "permuter instruction memory" 236 which again can contain up to 1024 words. This memory is read by the P-1 gate pulse and the memory word is placed in the register 238. From register 238 the information is gated to register 240 by the P-2 gate pulse. Register 250 controls the reading and writing of the A and A IN memories and also provides an input to the "event logic" which is shown in detail on FIG. 16.
The A and A IN memories are used differently in processor number 32 than in the other thirty-one processors. The data is always read out of memory A and goes by cable 242 to the inter-processor switch. The data from the switch is written into memory A IN. In FIG. 13C, it will be noted that data coming in from the inter-processor switch 33 on cable 344 is not only applied to the memories but also serves as an input to the event logic.
Referring to the right side of FIG. 13D, the logic simulation machine of this embodiment is started by a "start" applied to the lead 246. This pulse extends through the or circuit 248 to turn "on" single shot device 50, This produces the CL-1 pulse shown in FIG. 13E which is used to reset the address counters to trigger the "swap" switch. The "swap" switch is used to switch the A and B memories in the various processors. The CL-1 pulse is also used to set the event latch 260 to its 0 state and also to reset to 0 the count up counter 264. After the CL-1 pulse disappears, the delay circuit 252 will have an output to turn "on" single shot device 254 in order to produce the CL-2 pulse. The CL-2 pulse sets the flip flop 256 to its 1 state thus turning on the pulse generator 258. The pulse generator 258 produces the P-1, P-2, and P-3 gate pulses in succesion as shown by the timing chart of FIG. 13E. The P-1 and P-2 gate pulses are used to step the pipelining of the machine. The P-3 gate pulse is used to test the output of the compare unit 266. The total count register 268 is initially set for the total number of minor cycles required.
A "minor" cycle can be considered to be a time required to produce a single train of three gate pulses, in other words, the time to produce a single train consisting of P-1 gate pulse, a P-2 gate pulse and a P-3 gate pulse. The total count register 268 is set to a number which is equal to the total number of "minor" cycles plus the number of cycles required to run the last data through the pipeline. The count up counter 264 is incremented each "minor" cycle by the P-1 gate pulse. When the content of the count up counter 264 is equal to that of the total count register 268, a pulse will be produced by the compare unit 266 which extends to the AND circuit 270 which is tested each "minor" cycle by the P-3 gate pulse. The pulse produced by the AND circuit 270 at this time extends to flip-flop device 256 to reset it to its 0 state thus turning off the pulse generator 258. The same pulse is also applied to gate 262 in order to test the event latch 260. If the latch 260 is still on its 0 state, a pulse will be produced on wire 272, which extends through OR circuit 248 to again turn "on" single shot device 250 in order to start a new major cycle. If the event latch 260 is in its 1 state, a pulse will appear on lead 274 to signal the end of separations. This timing chart of FIG. 13E will be understood better when the detailed circuits shown in the remaining figures are described.
Reference should next be made to FIGS. 14A and 14B (arranged as illustrated in FIG. 14) which show the detailed circuitry for the A, B, A IN and B IN memories. These are the memories shown on the right portion of FIG. 13A. These memories are grouped into five groups of four each. One group is shown in detail on FIG. 14A. Each memory has a capacity of 1024 words and each word is two bits long. In the beginning, before the logic simulation machine is started, all memories are loaded with initialized data.
The five operand addresses existing in the register 204 in FIG. 13A are fed to the five memory sections in FIG. 13A, and therefore five stored values are presented to register 210 in FIG. 13A each time a read-access is performed on the memories. Depending on the setting of the switch 276 at the left side of FIG. 14A, the memories A or A IN will be read or the memories B or B IN will be read. At this point, it is important to note that the A or A IN memories or the B or B IN memories are read as a single memory.
The lower order ten bits of the operand section of the processor instruction memory 204 are used to address both A and A IN or both B and B IN. The highest order bit (the eleventh bit of this operand section) is used to select either the normal or the IN sections. The important consideration at this point is that when these memories are read, the A and the A IN memories or the B and B IN memories are read as a single unit. This is not true when the memories are written into as will be described later. When these just mentioned memories are written into, they are addressed separately by the address counter 214 (shown at the upper left of FIG. 14A) which has only ten bits. The switches represented by the reference character 212 on FIG. 13A, are shown in FIG. 14A by the gates 278, 280, 282, and 264. These gates are controlled by the switch 276. If switch 276 is in its "0" position, the gates 278 and 284 will be enabled thus permitting the memories A and A IN to be read from or the memories B and B IN be written into.
The read and write pulses are applied to the memories on FIG. 14A by the P-1 pulse through gate 286.
Reference should next be made to FIG. 15 which shows the A and A IN memories for the control processor 32. These memories are similar to those in the other processors except that they are used in a much simpler manner. Information is always read out of the A memory and always written into the A IN memory. Reading and writing of the memory is done by the P-1 pulse which is applied to the gate 288.
Reference should next be made to FIGS. 16A and 16B, arranged as shown in FIG. 16, which show details of the inter-processor switch which is indicated in FIG. 1 and by the dotted lines on the left side of FIGS. 13C and 13D. The inter-processor switch has its own address counter 290 which supplies addresses to the switch memory 294 by the P-1 gate pulse and later transferred to register 296 by the P-2 gate pulse. There are thirty-two sections of the five bits each in register 296 and each one of these sections is decoded into one of thirty-two leads. The groups of leads from each one of these decoders is applied to gates such as 298, 300, 302 and 304. These gates are pulsed each "minor" cycle by the P-1 gate pulse. In this manner, pulses appear on cables such as 306, 308, 310, and 312 to enable gates such as 314, 316 through 318 and 320. An examination of the cables and the gates on FIG. 16B will indicate that data coming from any processor can be gated to itself or to any other processor by the P- 1 gate pulse. It is believed that the operation of the inter-processor switch may now be understood by one skilled in the art and that no further explanation is needed for FIGS. 16A and 16B.
Reference should next be made to FIGS. 17A and 17B, arranged as illustrated in FIG. 17. FIG. 17A shows how the information in the register 216, which is also shown on FIG. 13A, refers to the logic unit for step one and how the control bits are directed to gates which suitably control the logic. From FIG. 17A, the results of the logic unit first step are passed along with certain of the control bits in register 216 (FIG. 13A) to the logic unit second step shown in FIG. 17B. The boolean logic equations for the logic unit second step are shown in FIG. 17 and it is believed that there will be no difficulty in understanding the operation of this unit.
The logic unit first step shown in FIG. 17A carries out specific logic functions. The inputs to logic unit first step accepts X.sub.1, X.sub.1 ', X.sub.2, X.sub.2 ', . . . , X.sub.6, X.sub.6 ' and m.sub.0.sup.0, m.sub.0.sup.1, m.sub.0.sup.2, . . . , m.sub.0.sup.31 and m.sub.1.sup.0, m.sub.1.sup.1, m.sub.1.sup.2, . . . , m.sub.1.sup.31 as shown in FIG. 17A. Using conventional logic circuits such as AND gates, OR gates and inverters arranged in a manner well known to those skilled in the art, the following logic operations are performed by the logic unit first step.
a=X.sub.1 +X.sub.1 ', PA0 a'=X.sub.1 +X.sub.1 ', PA0 b=X.sub.2 +X.sub.2 ', PA0 b'=X.sub.2 +X.sub.2 ', PA0 c=X.sub.3 +X.sub.3 ', PA0 C'=X.sub.3 +X.sub.3 ', PA0 d=X.sub.4 +X.sub.4 ', PA0 d'=X.sub.4 +X.sub.4 ', PA0 e=X.sub.5 +X.sub.5 ', and PA0 e'=X.sub.5 +X.sub.5 '. PA0 FP.sub.32 =abcde, PA0 FP.sub.1 =acde', PA0 FP.sub.2 =abcd'e, PA0 FP.sub.4 =abcd'e', PA0 FP.sub.5 =abc'de', PA0 FP.sub.6 =abc'd'e, PA0 FP.sub.7 =abc'd'e', PA0 FP.sub.8 =ab'cde, PA0 FP.sub.9 =ab'cde', PA0 FP.sub.10 =ab'cd'e, PA0 FP.sub.11 =ab'cd'e', PA0 FP.sub.12 =ab'c'de, PA0 FP.sub.13 =ab'c'de', PA0 FP.sub.14 =ab'c'd'e, PA0 FP.sub.15 =ab'c'd'e', PA0 FP.sub.16 =a'bcde, PA0 FP.sub.17 =a'bcde', PA0 FP.sub.18 =a'bcd'e, PA0 FP.sub.19 =a'bcd'e', PA0 FP.sub.20 =a'bc'de, PA0 FP.sub.21 =a'bc'de', PA0 FP.sub.22 =a'b'c'd'e, PA0 FP.sub.23 =a'bc'd'e', PA0 FP.sub.24 =a'b'cde, PA0 FP.sub.25 =a'b'cde', PA0 FP.sub.26 =a'b'cd'e, PA0 FP.sub.27 =a'b'cd'e', PA0 FP.sub.28 =a'b'c'de, PA0 FP.sub.29 =a'b'c'de', PA0 FP.sub.30 =a'b'c'd'e, PA0 FP.sub.31 =a'b'c'd'e'. PA0 GB.sub.32 =(m.sub.0.sup.32)(FP.sub.0), PA0 GB.sub.1 =(m.sub.0.sup.1)(FP.sub.1), PA0 GB.sub.2 =(m.sub.0.sup.2)(FP.sub.2), PA0 GB.sub.3 =(m.sub.0.sup.3)(FP.sub.3), PA0 BG.sub.4 =(m.sub.0.sup.4)(FP.sub.4), PA0 GB.sub.5 =(m.sub.0.sup.5)(FP.sub.5), PA0 GB.sub.6 =(m.sub.0.sup.6)(FP.sub.6), PA0 GB.sub.7 =(m.sub.0.sup.7)(FP.sub.7), PA0 GB.sub.8 =(m.sub.0.sup.8)(FP.sub.8), PA0 GB.sub.9 =(m.sub.0.sup.9)(FP.sub.9), PA0 GB.sub.10 =(m.sub.0.sup.10)(FP.sub.10), PA0 GB.sub.11 =(m.sub.0.sup.11)(FP.sub.11), PA0 GB.sub.12 =(m.sub.0.sup.12)(FP.sub.12), PA0 GB.sub.13 =(m.sub.0.sup.13)(FP.sub.13), PA0 GB.sub.14 =(m.sub.0.sup.14)(FP.sub.14), PA0 GB.sub.15 =(m.sub.0.sup.15)(FP.sub.15), PA0 GB.sub.16 =(m.sub.0.sup.16)(FP.sub.16), PA0 GB.sub.17 =(m.sub.0.sup.17)(FP.sub.17), PA0 GB.sub.18 =(m.sub.0.sup.18)(FP.sub.18), PA0 GB.sub.19 =(m.sub.0.sup.19)(FP.sub.19), PA0 GB.sub.20 =(m.sub.0.sup.20)(FP.sub.20), PA0 GB.sub.21 =(m.sub.0.sup.21)(FP.sub.21), PA0 GB.sub.22 =(m.sub.0.sup.22)(FP.sub.22), PA0 GB.sub.23 =(m.sub.0.sup.23)(FP.sub.23), PA0 GB.sub.24 =(m.sub.0.sup.24)(FP.sub.24), PA0 GB.sub.25 =(m.sub.0.sup.25)(FP.sub.25), PA0 GB.sub.26 =(m.sub.0.sup.26)(FP.sub.26), PA0 GB.sub.27 =(m.sub.0.sup.27)(FP.sub.27), PA0 GB.sub.28 =(m.sub.0.sup.28)(FP.sub.28), PA0 GB.sub.29 =(m.sub.0.sup.29)(FP.sub.29), PA0 GB.sub.30 =(m.sub.0.sup.30)(FP.sub.30), PA0 GB.sub.31 =(m.sub.0.sup.31)(FP.sub.31), PA0 GC.sub.1 =(m.sub.1.sup.32)(FP.sub.0), PA0 GC.sub.1 32 (m.sub.0.sup.1)(FP.sub.1), PA0 GC.sub.30 =(m.sub.0.sup.30)(FP.sub.30), PA0 GC.sub.31 =(m.sub.0.sup.31)(FP.sub.31), PA0 GD.sub.32 =(m.sub.1.sup.32)(FP.sub.0), PA0 GD.sub.1 =(m.sub.1.sup.1)(FP.sub.1), PA0 GD.sub.30 =(m.sub.1.sup.30)(FP.sub.30), PA0 GD.sub.31 =(m.sub.1.sup.31)(FP.sub.31), PA0 GE.sub.0 =(m.sub.1.sup.32)(FP.sub.0), PA0 GE.sub.1 =(m.sub.1.sup.1)(FP.sub.1), PA0 GE.sub.30 =(m.sub.1.sup.30)(FP.sub.30), and PA0 GE.sub.31 =(m.sub.1.sup.31)(FP.sub.31). PA0 GF=GB.sub.32 +GB.sub.1 + . . . +GB.sub.30 +GB.sub.31, PA0 GG=GC.sub.32 +GC.sub.1 + . . . +GC.sub.30 +GC.sub.31, PA0 GH=GD.sub.32 +GD.sub.1 + . . . +GD.sub.30 +GD.sub.31, and PA0 GI=GE.sub.32 +GE.sub.1 + . . . +GE.sub.31 +GE.sub.31.
Using the a, a', b, b', . . . , e' so produced, the logic performs the following operations:
Using the FP values thus generated and the m.sub.0.sup.32, m.sub.0.sup.0, . . . , m.sub.0.sup.31 inputs, the following steps are carried out:
Using the above values:
Reference is made to FIG. 18 which shows the details of the event logic of FIG. 13C. The event logic is shown on FIG. 13C with one of its inputs coming from the mask field in register 240 and its input from the cable 244 which comes from the interprocessor switch. As shown on FIG. 18, the data from the mask field in register 240 combines with the data from the inter-processor switch to produce an output of the or circuit 322 which provides an output on line 324 in order to set the event latch 260 to its 1 state. In that case, it is said that an "event" has occured. As mentioned before, the detection of an "event" causes a signal to appear on lead wire 274 on FIG. 13D which signals the end of operations.
In other words, no more major cycles are needed. The results of the operations which now exist in the A, B, A IN or B IN memories in processors 1 through 31 and in the A IN memory in processor 32 can be analyzed in any suitable manner. Such analyzing of this data in the machine is done by the host computer in conjunction with the lead computer which acts as an interface between the host computer and the logic simulation machine.
Although the logic simulation machine of the Cocke et al. patent as described above is capable of accurately simulating a wide variety of predetermined logic functions, nevertheless, that machine suffers from a drawback in that the machine imposes a predetermined fixed unit delay for each logic function simulated. This fixed unit delay is the same for both rising (low-to-high) and falling (high-to-low) signal transitions.
Accordingly, it is the primary object of the present invention to provide an improvement of the above-described logic simulation machine in which variable delays are permitted for the various logic functions being simulated.
Also, it is an object of the present invention to provide an improved logic simulation machine in which, in addition to the variable delays, the delays can be made different for rising and falling signal transitions.
Yet further, it is an object of the present invention to provide such an improved logic simulation machine in which the above-described major cycles are eliminated and the machine is operated on an event-driven basis. That is, a cycle of the machine is performed only when a signal transition occurs, rather than at fixed intervals of time.