Debugging of embedded solutions has always been a difficult job. As processors become faster and more complex, debugging and development with the current debug technology becomes more difficult. In order to address these complex issues, greater visibility into the program operation is needed. Three areas in which greater visibility is desired are program counter tracing, cycle accurate profiling, and load and store data logging. Access to this data may be available through a dedicated Debug Port. However, each of these problems demands a tremendous amount of information. Simply supplying a large number of high frequency pins to view all of this data is neither practical nor cost effective, and an encoding scheme is needed to further compress all of this data. An encoding has been used that encodes Program Counter (PC) tracing, cycle accurate timing of all instructions, and load and store data logging. All of this data can be transmitted across the same pins on the Debug Port.
The debug port is a tool that provides for the export of software or hardware generated trace information to an external recorder. The trace port utilizes a transmission format that addresses the requirements without noticeably compromising the format efficiency for any given implementation. The format primitives are viewed as a trace export instruction set. All processors use this instruction set to describe the system activity within a device. Each processor can describe the system activity in any manner that uses the instruction set and the rule set governing its use.
It is important to note that the external transmission rates/pins are fixed by the deployed receiver technology. These rates will remain relatively constant over time. This implies that as CPU clock rates increase, there will be increasing pressure to optimize the format to get the most compressed representation of system activity. This will be necessary just to maintain the status quo. Fortunately, the transmission format used provides an efficient means to represent the system activity. However, this efficiency comes at the expense of a larger on-chip hardware expenditure in order to gain the compression efficiency. This gives the processors the capability to improve the efficiency of their export bandwidth as it is stressed by CPU clock rate increases. The steady march to faster CPU clock rates and denser manufacturing processes will necessitate taking advantage of all compression opportunities and the best available physical transmission technology.
The format is designed to provide designers the ability to:
Optimize bandwidth utilization (most real information sent in minimum bits/second)
Chose less efficient but more cost effective representations of system activity
Mix of both of the above approaches (i.e. optimize PC trace transmission efficiency while implementing less efficient memory access export)
This gives different processors the ability to represent their system activity in forms most suitable to their architecture.
Tradeoffs has to be made since there are numerous cost/capability/bandwidth configuration requirements. Adjustments can be made to optimize and improve the format over time.
The transmission format remains constant over all processors while the nature of the physical transmission layer can be altered. These alterations can take three forms:
Transmission type (differential serial or conventional single ended I/O)
Number of pins allocated to the transmission
Frequency of the data transmission
This means that the format representing the system activity can and is viewed as data by the actual physical mechanism to be transmitted. The collection and formatting sections of the debug port should be implemented without regard to the physical transmission layer. This allows the physical layer to be optimized to the available pins and transmission bandwidth type without changing the underlying physical implementation. The receiver components are designed to be both physical layer and format independent. This allows the entire transmit portion to evolve over time.
A 10-bit encoding is used to represent the PC trace, data log, and timing information. The trace format width has been decoupled from number of transmission pins. This format can be used with any number of transmission pins. The PC trace, Memory Reference information, and the timing information are transmitted across the same pins.
Packets can contain opcodes or data, or both. A code packet contains an opcode that indicates the type of information being sent. The opcode can be 2 to 10 bits long. The remainder of the code packet will hold data associated with that opcode.
In many cases, additional data needs to be associated with an opcode. This data is encoded in subsequent packets referred to as data packets. Data packets contain information that should be associated with the previous opcode.
A sequence of packets that begins with code packet and includes all of the data packets that immediately follow the code packet is referred to as a command. A command can have zero or more parameters. Each parameter is an independent piece of data associated with the opcode in the command. The number of parameters expected depends on the opcode. The first parameter of a command is simply encoded using data packets following a code packet. The first data packet of subsequent parameters is marked with the 10 opcode.
The interpretation of a command is dependent on two factors, the opcode of the command, and the number of parameters included in the command. In other words, a code packet has one meaning if it is immediately followed by another code packet, but the same packet can take on an entirely different meaning if it is succeeded with data packets. Trace opcodes are shown in Table 1.
TABLE 1000000 0000No Information/End of Buffer000000 0001Start Repeat Single000000 0010PC Trace Gap000000 0011Register Repeat000000 0100NOP SP loop000000 0101SPLOOP marker000000 0110Timing Trace Gap000000 0111Command Escape000000 1000Exception Occurred000000 1001Exception Occurred with Repeat Single000000 1010Block Repeat 0000000 1011Block Repeat 0 with Repeat Single000000 1100Block Repeat 1000000 1101Block Repeat 1 with Repeat Single000000 1110Memory Reference Trace Gap000000 1111Periodic Data Sync Point000001 0xxxTiming Sync Point000001 1xxxMemory Reference Sync Point000010 xxxxPC Sync Point/First/Last/000011 000xPC Event Collision000011 001xReserved000011 01xxReserved000011 1xxxReserved00010x xxxxExtended Timing Data00011x xxxxCPU and ASIC Data0010xx xxxxReserved001100 0000Memory Reference Trace Gap (legacy001100 0001Periodic Data Sync Point (legacy0011xx xxxxMemory Reference Block01xxxx xxxxRelative Branch Command/Register Branch10xxxx xxxxContinue11xxxx xxxxTiming
The timing trace gap code indicates that some timing trace information is missing at this point. The timing trace remains invalid until the Synchronization code is found in the trace stream. The timing trace gap code can be issued at any point.
It is permissible to have timing syncs included in a gap thus introducing a discontinuity in the timing sync ID sequence.
Issuing of a timing gap command will cause a break in the PC decoding process until the next sync point.
The PC trace gap code indicates that some PC trace information is missing at this point. This could occur for a number of reasons, such as:
The trace queues in the target processor have overflowed before all of the data was transmitted.
A trace sync point was about to get an entire ID value (0-7) behind another sync points.
A trace stream was about to send data commands in an order that violated the predefined rules. This
should be prevented by the encoding hardware.
The next PC trace information is a PC Synchronization code and the PC trace remains invalid until the Synchronization code is found in the PC trace stream. The PC Trace Gap code can only be issued at the natural boundary between two packets or packet sequences.
It is permissible to have PC syncs included in a gap thus introducing a discontinuity in the PC sync ID sequence.