Emulation systems typically include one or more integrated circuit chips, each of which emulates a portion of a digital design. The integrated circuit chips may be field-programmable devices (FPDs) such as field-programmable gate arrays (FPGAs). Each FPD includes a set of reconfigurable logic blocks (RLBs) interconnected by a programmable routing resource matrix. The typical FPGA has up to a few tens of thousands of usable RLBs. Design state elements, such as logic gates, are mapped onto the RLBs such that the typical FPGA may emulate up to several hundred thousand design logic gates.
During emulation of a design in an FPD, it is desirable to obtain trace data of the states of the various design state elements and/or other design elements and/or design signals mapped onto the emulation FPD. Such trace data, also known as user visibility data, is made available to the user and is often used to debug a design. Unfortunately, as the number of state elements mapped into an FPD increases, the amount of trace data increases as well. For example, an FPGA emulating one hundred thousand state elements would generate up to one hundred thousand bits, or 0.1 Mb, of trace data per clock cycle. This trace data is further increased where emulation systems incorporate a number of parallel FPGAs. For instance, a system having ten parallel FPGAs would generate up to 1 Mb of trace data per clock cycle.
The amount of trace data to be dealt with is dramatically increased when one considers that emulation runs typically involve a plurality of clock cycles, such as hundreds of millions of clock cycles or more. For example, where an emulation is run over one billion clock cycles, the total amount of trace data generated during the emulation may be up to (1 billion)×1 Mb=1,000 terabits (Tb). Thus, there becomes the problem of how to store, transfer, and/or otherwise handle all of this trace data. Although the cost of memory has decreased over the years, it is nevertheless expensive. Large amounts of memory also takes up valuable real estate and requires additional power, both of which are usually of limited availability in an emulation system. It would therefore be desirable to limit the amount of memory in an emulation system.
Yet another complication arises when one considers the speed at which the emulation clock runs. Typical emulation systems may run a clock at 1 MHz or more. For example, where the clock in the above example is run at 1 MHz, the total bandwidth of trace data generated may be up to (1 Mb)×(1 MHz)=1 Tb per second. When an emulation system is run over multiple emulation clock cycles, the bandwidth of trace data often exceeds the capabilities of state-of-the-art physical interfaces, such as integrated circuit packaging pin limitations, memory chip size, and network bandwidth.