1. Technical Field
The present invention is directed to data processing systems. More specifically, the present invention is directed to a method, apparatus, and computer program product in a processor for concurrently sharing system memory among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers.
2. Description of Related Art
Making tradeoffs in the design of commercial server systems has never been simple. For large commercial systems, it may take years to grow the initial system architecture draft into the system that is ultimately shipped to the customer. During the design process, hardware technology improves, software technology evolves, and customer workloads mutate. Decisions need to be constantly evaluated and reevaluated. Solid decisions need solid base data. Servers in general and commercial servers in particular place a large demand on system and operator resources, so the opportunities to collect characterization data from them are limited.
Much of performance analysis is based on hardware-collected traces. Typically, traces provide data used to simulate system performance, to make hardware design tradeoffs, to tune software, and to characterize workloads. Hardware traces are almost operating system, application, and workload independent. This attribute makes these traces especialy well suited for characterizing the On-Demand and Virtual-Server-Hosting environments now supported on the new servers.
A symmetric multiprocessing (SMP) data processing server has multiple processors with multiple cores that are symmetric such that each processor has the same processing speed and latency. An SMP system could have multiple operating systems running on different processors, which are a logically partitioned system, or multiple operating systems running on the same processors one at a time, which is a virtual server hosting environment. Operating systems divide the work into tasks that are distributed evenly among the various cores by dispatching one or more software threads of work to each processor at a time.
A single-thread (ST) data processing system includes multiple cores that can execute only one thread at a time.
A simultaneous multi-threading (SMT) data processing system includes multiple cores that can each concurrently execute more than one thread at a time per processor. An SMT system has the ability to favor one thread over another when both threads are running on the same processor.
As computer systems migrate towards the use of sophisticated multi-stage pipelines and large SMP with SMT based systems, the ability to debug, analyze, and verify the actual hardware becomes increasingly more difficult, during development, test, and during normal operations. A hardware trace facility may be used which captures various hardware signatures within a processor as trace data for analysis. This trace data may be collected from events occurring on processor cores, busses (also called the fabric), caches, or other processing units included within the processor. The purpose of the hardware trace facility is to collect hardware traces from a trace source within the processor and then store the traces in a predefined memory location.
As used herein, the term “processor” means a central processing unit (CPU) on a single chip, e.g. a chip formed using a single piece of silicon. A processor includes one or more processor cores and other processing units such as a memory controller, cache controller, and the system memory that is coupled to the memory controller.
This captured trace data may be recorded in the hardware trace facility and/or within another memory. The term “in-memory tracing” means storing the trace data in part of the system memory that is included in the processor that is being traced.
Prior art approaches to in-memory tracing used a specialized data path between the trace facility and the memory controller. For example, FIG. 15 depicts a prior art approach to in-memory tracing in a processor 1500. A memory controller 1501 is coupled to a system memory 1502 through write buffers 1504. Other devices, such as a processor core (not shown) can communicate with memory controller 1501 through fabric bus controller/bus 1506.
A multiplexer 1508 selects either the signal from fabric bus controller/bus 1506 or the signal from trace facility 1510. When in a normal, non-tracing, processing mode, multiplexer 1508 selects the signal from fabric bus controller/bus 1506. When in a trace mode when trace facility is collecting and needs to store traces in system memory 1502, multiplexer 1508 selects the signal from trace facility 1510. Thus, as is clear from FIG. 15, in the prior art system, a choice must be made between the data from the bus or the trace data. System memory 1502 cannot be shared for storing trace data and at the same time be accessed by the bus to read or store other data. When in trace mode, system memory 1502 cannot be accessed to store or read data other than the trace data.
There are problems with the prior art method. When in a trace mode, memory controller 1501 is dedicated to trace facility 1510. While memory controller 1501 is dedicated to trace facility 1510, it is precluded from being used for any other purpose. This is a significant limitation, particularly in systems that have only one memory controller. In systems with only one memory controller, the system must be dedicated to the trace function and cannot perform any other work that would require the use of system memory 1502 when in trace mode.
In addition, the prior art system requires that in-memory tracing be completed using the system memory 1502 that is part of the processor 1500 that is being traced. The trace data captured by trace facility 1510 cannot be stored in any memory other than system memory 1502.
In addition to the limitations described above, the prior art requires that the system be booted to a trace mode instead to a normal mode when tracing is desired. In the prior art systems, the memory had to be allocated to store traces prior to the initial program load (IPL) being completed. FIG. 16 depicts a high level flow chart that illustrates booting a prior art system in a trace mode so that tracing can be performed and the trace data saved. The process starts as depicted by block 1600 and thereafter passes to block 1602 which illustrates cycling the machine's power off and then back on. Next, block 1604 depicts a determination of whether or not trace data is to be stored. If a determination is made that trace data is not be stored, the process passes to block 1606 which illustrates executing a normal IPL process and completing the booting of the machine. Thereafter, block 1608 depicts executing normal processing. The process then terminates as illustrated by block 1610.
Referring again to block 1604, if a determination is made that trace data is to be stored, the process passes to block 1612 which depicts allocating memory for storing traces. The dedicated memory will be a fixed size throughout the trace process. The size of the dedicated memory will not be able to be changed without rebooting the system and executing another IPL process.
A memory controller is dedicated to the trace process as described above. Because the memory controller is dedicated to the trace process, the rest of the processor, other than the trace facility, loses the ability to write to the memory that is controller by the dedicated memory controller.
Thereafter, block 1614 illustrates executing the IPL process to trace. This is a different IPL process than the normal IPL process executed as depicted by block 1606. For example, during the trace IPL process, multiplexers are set for tracing. Next, block 1616 depicts capturing traces. Thereafter, block 1618 illustrates a determination of whether or not tracing is finished. If a determination is made that tracing is not finished, the process passes back to block 1616. Referring again to block 1618, if a determination is made that tracing has finished, the process passes to block 1620 which illustrates a determination of whether or not to start normal processing. If a determination is made not to start normal processing, the process passes back to block 1620. If a determination is made to start normal processing, the process passes back to block 1602.
Therefore, a need exists for a method, apparatus, and computer program product in a processor for concurrently sharing system memory among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers.