The increasing complexity of system designs, increased investment required due to this complexity, and shortened product cycles have presented significant challenges to post-silicon design verification of chipsets. This is especially true with respect to high-end cache coherent non-uniform memory access (“ccNUMA”) chipsets where systems can be extremely large and complex. Processor post-silicon verification is typically focused on electrical verification at least as much as functional verification due to the large amount of full custom design. Chipsets present a different challenge due to the large number of cells of which they are comprised. Additionally, due to the sheer number of buses, internal bus arbitration, cache coherency control, queue arbitration, etc., in a large ccNUMA server, post-silicon functional verification of such a chipset consumes a greater amount of resources with respect to electrical verification than processors typically consume. Internal observability, while relatively simple in pre-silicon verification, poses a major obstacle to debug and functional test coverage.
Determining when system verification is complete is a second major obstacle to completing post-silicon verification in a time-effective manner. While pre-silicon simulation-based testing depends significantly on labor intensive directed and pseudo-random testing, post-silicon testing has historically depended on observing system operations that imply correct behavior. In addition to external interface monitoring, enhanced internal observability via Real-Time-Observabilty (“RTO”) features, such as those described in the above-referenced related patent applications, facilitates monitoring of internal state that provides active confirmation of coverage of the design space.
As a result of having this increased internal observability, more pre-silicon design data and metrics can be leveraged improving schedule and reducing resource consumption. It is known that verification is complete when criteria derived from pre-silicon data is observed in silicon.
Performing post-silicon design verification is an industry standard practice that helps expose bugs not usually found in pre-silicon verification Typical post-silicon bugs discovered include those that are manifested after long or at-speed operation of the system, bugs due to incorrect modeling of the hardware and firmware interfaces, bugs due to RTL errors that escaped pre-silicon detection and bugs due to incorrect mapping of RTL to silicon (synthesis/physical bugs).
Accepted ways in which to exercise systems in order to expose post-silicon bugs include running operating systems and software applications targeted for the final system, creating specific directed software tests that stress different portions of the system, and running software tests that create random system operations. In the process of performing post-silicon verification of a large chipset, all of these standard exercise methods may be used to test the chipset in various functional areas, including: (1) ensuring that correct packet operations are observed on all major interfaces, including, for example, processor, intra-chip, and IO interfaces; (2) ensuring that system performance and system latencies are measured in the actual systems, wherein both directed tests and bench marking applications are used for these measurements; (3) ensuring that the system responds properly to errors, including fatal error conditions, injected into the system; (4) ensuring extended system operation under directed and random exercises produces correct behavior; (5) ensuring operating system and applications perform correctly; and (6) ensuring system configurations (e.g., processors, memory, I/O, platforms) perform correctly.
While internal signal observability features have been available in some FPGA architectures and ASICs, they have been of very limited scope. Typically, such limited internal observability features have not been used for functional test coverage. The primary objective of post-silicon verification is to test a chipset such that all of its internal functionality and product configurations are completely exercised and no faults are observed to occur, supporting correct operations for end users.