1. Field
This disclosure relates generally to tracing processes, and more specifically, to tracing processes executing in a multi-threaded processor.
2. Related Art
Various processor designers have attempted to increase on-chip parallelism through superscalar techniques, which are directed to increasing instruction level parallelism (ILP), and multi-threading techniques, which are directed to exploiting thread level parallelism (TLP). A superscalar architecture attempts to simultaneously execute more than one instruction by fetching multiple instructions and simultaneously dispatching them to multiple (sometimes identical) functional units of the processor. Superscalar processors differ from multi-core processors in that the functional units in the superscalar processor are not usually entire processors. A typical multi-threading operating system (OS) allows multiple processes and threads of the processes to utilize a processor one at a time, usually providing exclusive ownership of the processor to a particular thread for a time slice. In many cases, a process executing on a processor may stall for a number of cycles while waiting for some external resource (for example, a load from a random access memory (RAM)), thus lowering efficiency of the processor. Simultaneous multi-threading (SMT) allows multiple threads to execute different instructions in the same clock cycle, using functional units that another executing thread or threads left unused. While the number of concurrent threads is determined by a chip designer, practical restrictions on chip complexity have usually limited the number of concurrent threads to two for most SMT implementations.
Interleaved multi-threading or thread switch multi-threading (TMT) interleaves issue of multiple instructions from different threads. TMT can be further divided into fine-granularity TMT and coarse-granularity TMT depending on the frequency of interleaved issues. Fine-granularity TMT issues instructions for different threads after each cycle. Coarse-granularity TMT usually only switches to issue instructions from another thread when the current executing thread causes some long latency event (e.g., a memory page fault). Chip-level multiprocessing (CMP) integrates two or more processors (e.g., superscalar processors) in one chip. In this case, each processor may execute one thread independently in a number of different combinations. For example, when the CMP includes two processors, the processors may be configured as TMT/SMT, TMT/TMT, or SMT/SMT. Symmetric multiprocessing (SMP) is a multi-processor computer architecture where two or more identical processors are connected to a single shared main memory. SMP systems usually allow any processor to work on any task no matter where the data for that task is located in memory. With proper operating system support, SMP systems can move tasks between processors to balance the workload between the processors. In computing, CMP is essentially SMP implemented in a single very large scale integration (VLSI) integrated circuit. Multiple processor cores (multi-core) typically share a common second-level or third-level cache. A goal of a CMP system is to allow greater utilization of thread-level parallelism (TLP), especially for applications that lack sufficient instruction-level parallelism (ILP) to efficiently utilize superscalar processors.
The Nexus 5001 Forum (formerly known as the global embedded processor debug interface standard consortium (GEPDISC)) was formed to develop an embedded debug interface standard (hereinafter, the “Nexus standard”) for embedded control applications. The Nexus standard is particularly applicable to the development of automotive powertrains, data communication equipment, computer peripherals, wireless systems, and other control applications. Developers of embedded processors usually need to have access to a basic set of development tool functions in order to accomplish their jobs. In general, development tools should minimally impact operation of a system under development. For run-control, a developer typically needs to query and modify when a processor is halted, showing all locations available in a supervisor map of the processor. Moreover, a developer also usually needs support for breakpoint/watchpoint features in debuggers, either as hardware or software breakpoints depending on the architecture. For logic analysis, a developer usually needs to access instruction trace information. A developer typically needs to be able to interrogate and correlate instruction flow to real-world interactions. A developer also usually needs to retrieve information on how data flows through the system and to understand what system resources are creating and accessing data. Finally, a developer usually needs to assess whether embedded software is meeting a required performance level.
The Nexus standard provides a specification and guidelines for implementing various messages, e.g., program trace messages (such as branch history messages and synchronization messages), data trace messages, and task/process identification messages (such as ownership trace messages), that may be utilized in debugging applications while minimally impacting operation of a system under development. As defined by the Nexus standard, a program trace message is a message that is provided in response to a change of program flow. According to the Nexus standard, a data trace message is a message that provides visibility of a target processor when a memory write/read reference is detected that matches debug logic data trace attributes. The Nexus standard also defines an ownership trace message (OTM) as a message that provides a macroscopic view of a processor that may be used for task flow reconstruction when debugging software that is written in a high-level language. While the Nexus standard provides a relatively good solution for source level software debugging in low-end and mid-level processors, the Nexus standard is not currently applicable to high-end processors with multi-threading capability. That is, the Nexus standard does not provide a technique for differentiating between threads and, as such, cannot be utilized to debug processors employing multi-threading architectures.
What is needed are techniques for extending the Nexus standard to processors with multi-threading capability.