Hardware emulators are programmable devices used in verification of hardware designs. Hardware emulators may include hardware components capable of processor-based (e.g., hardware-based) emulation of logic systems, such as application specific integrated circuits (ASICs), central processing units (CPUs), graphics processing units (GPUs), and the like. By executing various forms of programmable logic, the hardware emulators may be programmed to mimic the functionality of nearly any prototype logic system design, such as an integrated circuit, an entire board of integrated circuits, or an entire system that is undergoing testing. This ability of mimicking of functionality allows logic system designers to prototype their logic system design using processor-based emulation before actually manufacture the logic system, such as an ASIC product, and thereby potentially saving millions of dollars by avoiding design failures.
The processor-based emulators comprises a processor that functions as a Boolean processor. The processor can do computations of input functions of various widths. These processor-based emulators sequentially evaluate combinatorial logic levels, starting at the inputs and proceeding to the outputs. Each pass through the entire set of logic levels is known as a cycle, and the evaluation of each individual logic level is known as an emulation step. The programs executed by the processor in a processor-based emulator may include instructions containing a sequence of operations. The processor typically corresponds to an instruction memory that is read sequentially and provides instructions that are used to read bits out of a data array.
The processor is connected to the data array that is a special memory. The data array has multiple read ports and supplies input data to the processor via each read port. The processor evaluates the data supplied from the data array in accordance with an instruction word supplied from an instruction memory. The bits that are read from the data array are fed to a lookup table (LUT) that is controlled by the instruction, and the result of the LUT function is then stored back into the data array. The data array may also contain the results of previous LUT evaluations. The data array further stores inputs that come from outside the processor (e.g., from other processors of the hardware emulator), and therefore the LUT not only has access to all previous results, but also values from outside the processor.
Processor-based emulators described above typically have a 16-way multiplexer LUT that is used to evaluate any Boolean function of 4 inputs (LUT4). The architecture of the processor-based emulators is built such that the LUT4 can perform only one evaluation per clock cycle.
In some architectures, multiple processors may be combined to form a processor cluster. Typically, the processor cluster may contain 4 or 8 processors. Because of the clustering of 4 to 8 processors, the processor cluster is able to perform more than 1 LUT4 evaluation per clock cycle. With the processor cluster architecture presently available, a chain of 4 LUT4s may be achieved. This clustered architecture means that in one clock cycle, up to 4 LUT4 evaluations can be performed. Due to the time-multiplexed nature of the processors, inputs required to do a computation are quite slow to operate. Also, there are large number of multiplexers positioned ahead of a computation logic in the architecture. Due to the use of large number of multiplexers, the number of processors that can be chained together becomes limited. Since only a limited number of the processors can be chained together, logic implementation using the chained processors are limited. One type of logic that is difficult to implement using the existing architecture having the limited number of the processors is arithmetic operations, such as addition and subtraction. These and other arithmetic operations require long chains of processors that tend to be cascaded together due to the limits of how many processors can be chained together. As an example, in one clock cycle, a processor can perform an addition of two 2-bit values, and the processor may utilize multiple clock cycles to perform an addition of two 32-bit values.
Thus, there is a need in the art for a hardware emulator that is able to perform arithmetic operations, such as addition and subtraction, at a speed faster than presently available hardware emulators.