Not applicable.
1. Field of the Invention
The present invention relates generally to logic analyzers that are used to facilitate the design of digital logic devices. More particularly, the present invention relates to an on-chip logic analyzer capable of receiving program counter data, and of selecting some of that data for storage in an on-chip memory. Still, more particularly, the invention relates to a loop compressor for an on-chip logic analyzer, which permits software loops to be detected so that the program counter data that the memory entries are not consumed with program counter data of the software loop.
2. Background of the Invention
The design and development of digital logic circuits has become increasingly complex, due in large measure to the ever-increasing functionality offered in such circuits. Integrated circuits are constantly surpassing milestones in performance, as more and more functionality is packaged into smaller sizes. This enhanced functionality requires that a greater number of transistors be included in an integrated circuit, which in turn requires more rigorous testing to insure reliability once the device is released. Thus, integrated circuit designs are repeatedly tested and debugged during the development phase to minimize the number and severity of errors that may subsequently arise. In addition, chips may be tested to determine the performance characteristics of the device, including the speed or throughput of the chip, software running on the chip, or the aggregate performance of the system.
As integrated circuits become more complex, the length of the debug phase increases, requiring a greater lead-time before product release. In addition, as the complexity of integrated circuits increase, it becomes necessary to fabricate more prototype iterations of the silicon (or xe2x80x9cspinsxe2x80x9d of silicon) in order to remove successive layers of bugs from the design, thereby increasing the engineering and material cost of the released product. It would be desirable to reduce these engineering and material costs and speed up the product cycle. Moreover, if the most relevant state data was available for analysis by the debugging team, the debugging phase for products could be reduced significantly, thereby minimizing cost, and enabling an earlier product launch.
One of the chief difficulties encountered during the debug phase of a product is identifying the source of an error, and obtaining relevant data regarding the conditions existing at the time of the error. This can be extremely difficult because the error may make it impossible to obtain state information from the integrated circuit. For example, in a processor, an error may cause the processor to quit executing, thus making it impossible to obtain the state data necessary to identify the source of the error. As a result, the debug process often unfortunately requires that the debug team infer the source of the error by looking at external transactions at the time of the error, instead of being able to look at the internal state data. If the internal state of the processor could be acquired and stored, these inferences would be replaced by solid data. By reducing the designer""s uncertainty and increasing the available data, this would be beneficial in solving problems with the processor hardware or software.
In certain products under development, the number of transistors is exceedingly large and the dimensions are exceedingly small. In such products, the manual probing of internal terminals and traces is impractical and inaccurate. Consequently, the usual technique for testing the state of terminals and traces in highly complex chips is to route signals through the chip""s external output terminals, to some external interface. This approach, however, suffers in several respects.
First, as noted above, the signals obtained from the external output terminals are removed from the signal states of the internal terminals and traces. Thus, this technique requires the debugging team to infer the state of the internal terminals and traces from signals appearing on an external bus. Second, routing the desired state to external terminals often requires more wiring, silicon, drivers, pads and power than is affordable. Attempts to do so can compromise the normal functioning of the chip. And costs escalate throughout the design, often impacting the micropackaging and system board as well as the die. Third, oftentimes the internal clock rate of the chip operates at a much higher rate than the external logic analyzers that receive and process the data. As an example, processor designs currently under development operate at clock speeds up to and exceeding 2.0 GHz. The fastest commercial logic analyzers, despite their expense, are incapable of operating at GHz frequencies. Thus, either certain data must be ignored, or some other mechanism must be employed to capture the high-speed data being generated on the chip. The typical approach is to run the chip at a slower clock speed so the data can be captured by external test equipment. This solution, however, makes it more difficult to detect the bugs and errors that occur when the chip is running at full clock speeds. Some errors that occur at full clock speed will not be detected when the clock speed is reduced to accommodate the off-chip logic analyzers. Also, increasingly the processor connects to external components that have a minimum speed, below which they will not operate. These speeds require the processor to operate faster than the external logic analyzer can accommodate.
As an alternative to sending data off-chip, attempts have been made to capture certain state data on chip, thereby reducing the problems of interfacing slower speed test equipment with high-speed devices. In this approach, history buffers, and even on-chip logic analyzers (OCLA) are provided to acquire and store event and/or time sequenced data on the chip itself. In the past, to the extent that designers sought to incorporate memory onto the chip for debug and test purposes, dedicated memory devices (usually RAM) were used. Thus, in prior art designs that attempted to capture debug and test information on-chip, a dedicated memory structure was incorporated into the chip design solely to store data for the debug and test modes. The problem with this approach, however, is that it requires the allocation of a significant amount of chip space to incorporate such dedicated memory devices, and these memory devices, while used extensively during the design and development phase of the chip, add little or nothing to the performance of the chip once it is released into production. Thus, the inclusion of dedicated memory space on the chip represents an opportunity cost, and means that functionality and/or performance is sacrificed to include this dedicated memory on the chip. Consequently, the inclusion of memory for debug purposes, while helpful in the debug and test phase, is generally viewed as undesirable because of the accompanying loss of performance and functionality that must be sacrificed. If a dedicated memory device is included on the chip, system designers normally require that such a memory be very small in size to minimize the cost increase, as well as the performance and functionality loss that accompany the inclusion of such a dedicated memory. As the size of the dedicated memory becomes smaller, so too does the prospect that the state information stored in the dedicated memory will be sufficient to assist in the debug process. Thus, as the dedicated memory space becomes smaller, so too does the probability that useful debug data will be captured. In relative terms, the largest dedicated on-chip memories typically are incapable of storing very much data.
In assignee""s co-pending application entitled Method And Apparatus For Efficiently Implementing Trace And/Or Logic Analysis Mechanisms On A Processor Chip, U.S. Ser. No. 10/034,717, the teachings of which are incorporated herein, the on-chip cache memory is used to store data from the on-chip logic analyzer. The use of the on-chip cache memory as a storage device for the in-chip logic analyzer permits the storage of a relatively large amount of state data on the chip as compared to previous designs. While the use of the on-chip cache memory greatly expands the amount of state data that can be stored on-chip, the extent of data that can be stored is not limitless. Modern processors and other complex circuits often have pipelined operation, with multiple instructions being manipulated each cycle. For a processor operating at 2 GHZ, the amount of data that can be stored in a typical cache memory represents only a few microseconds of data. Consequently, if the OCLA stores all incoming data in the cache, the cache would quickly overflow, and potentially relevant data would be lost.
One of the key pieces of information used in analyzing a processor and/or the software executing on the processor is data reflecting the operation of the Program Counter (PC). The PC data provides the address of software instructions that have been fetched, executed or retired by the processor. By tracing the PC data, a list or trace can be developed of the software instruction addresses manipulated by the processor. The ability to reconstruct the software flow through a Program Counter (PC) trace is an essential tool for debugging and performance analysis of the processor and any software running on the processor. Even with the greatly expanded memory capacity available from using the on-chip cache memory, the storage of PC traces requires more memory than can be provided in a typical cache memory. Consequently, some mechanism must be developed to reduce the amount of data stored in the on-chip memory.
One of the key contributors to the memory consumption of PC traces is software loops. Software loops are fundamental constructs that are pervasively used in programming computers. A software loop is a sequence of instructions which are performed iteratively (possibly with some iteration-to-iteration variation) in the execution of a program. The instructions are generally compact. Such machine instructions are generated by programming constructs such as xe2x80x9cdoxe2x80x9d, xe2x80x9cforxe2x80x9d and xe2x80x9cwhilexe2x80x9d in the C programming language. Equivalents exist in all procedural languages, and non-procedural languages generate these structures implicitly.
Unfortunately, while software loops consume a great amount of memory, they typically yield very little information. Once the debugger knows that a loop has been encountered, tracing additional iterations of the loop may provide little or no additional information. The problem is that tracing each iteration of the loop often displaces the trace of code that preceded the loop, so that the only PC data available to the debugger is successive iterations of the loop addresses.
It would be desirable if a system or technique was developed that would permit software loops to be detected and which eliminated multiple iterations of a software loop from being stored in memory as part of a PC trace. It would also be advantageous if the system or technique that was capable of detecting a software loop was capable of implementation in a small space, to permit inclusion on-chip as part of an on-chip logic analyzer. Despite the apparent advantages such a design would offer, to date no viable solution has appeared.
The problems noted above are solved in large part by an on-chip logic analyzer that includes loop compression logic to monitor the address of a program counter and to only store addresses that have not been recently issued. The loop compressor comprises a content addressable memory (CAM) that when enabled issues a hit/miss signal depending on whether the incoming instruction address is already present in the CAM. The hit/miss signal is used to signal the memory regarding whether the incoming instruction address should be stored. If the instruction address is already present in the CAM, the CAM signals a hit, and the memory does not store the instruction. If the instruction is not present in the CAM, the CAM signals a miss, enters the new address into the CAM, and the memory stores the instruction, assuming any other OCLA conditions are satisfied.
According to the preferred embodiment of the invention, a CAM is provided as part of an OCLA and is used to detect software loops and other software instructions that are of a recurring nature. The CAM preferably has a programmable depth, and thus can store a variable number of instructions. The depth of the CAM can be made very shallow to permit vary fine analysis of the program counter trace, or can be made relatively deep (depending on the amount of space available to implement the CAM) to provide coarser control and ability to detect and filter software loops with many instructions.
According to another aspect of the present invention, a programmable mask may be used in conjunction with the CAM to select particular bits of the instruction to examine. This provides greater power to the CAM, and enables the user to define boundaries to use for the CAM matching. By masking certain bits from the CAM comparison, instruction addresses can be grouped together for consideration, thus reducing the number of CAM entries that is necessary to cover a loop. Thus, for example, if the lower order bits were masked, then any instruction address that was stored in the CAM would cause a hit signal to issue if any other instruction address was presented to the CAM in which the higher order bits matched.
The ability to program the CAM with a mask value, and with a desired depth provides a great deal of flexibility to the user in filtering out software loops. To simplify the design, the CAM preferably uses a FIFO scheme to handle data organization. A new incoming instruction address that does not generate a hit is stored in the first entry in the CAM. As new entries are added to the CAM, each entry is displaced one position in the CAM, until ultimately it is dropped out of the CAM. The FIFO approach reduces the wiring requirements of the CAM since wires can be run to just a single entry from the incoming data bus, and each other entry spills to the adjacent entry.
These and other aspects of the present invention will become apparent upon reading the detailed description of the preferred embodiment and the appended claims.