In recent years, the use of parallel processing apparatuses has become common. Specifically, a parallel processing apparatus includes a plurality of processing units, such as a plurality of CPU cores of a multi-core processor, and a plurality of processors of a multiprocessor.
In such a parallel processing apparatus, it is sometimes necessary to perform tracing (i.e. to trace records of the operations performed by the processing units) on each of the processing units, in order to conduct performance analysis, such as measuring of the processing loads thereon, and behavioral analysis, such as measuring of memory accesses and cache hit rates. In such cases, it is necessary to keep the temporal consistency among the results of the tracing performed on the processing units. In other words, it is sometimes necessary to obtain the tracing results for the processing units with reference to the same timeline.
In view of this necessity, Patent Literature 1 proposes a timer adjustment system. This system is used with a parallel processing apparatus (i.e., multiprocessor system) including a plurality of processing units each having a timer, and aims to prevent errors among the values obtained by the respective timers of the processing units. For this purpose, a generation unit for generating a time synchronization signal is provided in the system, and the generation unit generates and sends a time synchronization signal to each processing unit. Each processing unit adjusts the value obtained by its timer, based on the time synchronization signal.
FIG. 16 is a block diagram of the timer adjustment system disclosed in Patent Literature 1.
As shown in FIG. 16, the timer adjustment system disclosed in Patent Literature 1 includes a generation unit 1611, an output unit 1612, a distribution unit 1616, and an input unit 1613. The generation unit 1611 generates time synchronization signals. The output unit 1612 is for outputting the time synchronization signals transferred from the generation unit 1611 to all the processing units. The distribution unit 1616 distributes the time synchronization signals output from the output unit 1612 to each of the processing units. The input unit 1613 receives the time synchronization signals that have been output by the output unit 1612, distributed by the distribution unit 1616 and returned to the input unit 1613. The timer adjustment system shown in FIG. 16 is also provided with a measuring unit 1614 and a synchronization unit 1615. The measuring unit 1614 measures the time (hereinafter called “propagation time”) from when a time synchronization signal is output form the output unit 1612 to when the signal returns to the input unit 1613 via the parallel processing apparatus including the processing units. The synchronization unit 1615 corrects the errors of the values obtained by the timers, based on the propagation time measured by the measuring unit 1614.
With use of the timer adjustment system shown in FIG. 16, if the operating frequencies of the processing units constituting the multiprocessor system are the same, it is possible to perform the tracing for the processing units with reference to the same timeline, by matching the values held by the timers of the processing units with each other at a predetermined time point.
With the timer adjustment system shown in FIG. 16, however, if the operating frequencies of the processing units in the parallel processing apparatus are not the same, it is impossible to perform the tracing for the processing units with reference to the same timeline. This is because it is impossible to match the values held by the timers of the processing units without consideration of the relationship between the operating frequency of each processing unit and the propagation delay time.
In view of this problem, a simulation apparatus for realizing tracing on a plurality of processors with reference to the same timeline has been proposed (Patent Literature 2). In this system, each processor is provided each other with a common signal, together with a clock signal with a different operating frequency. The system captures the common signal and phase information pieces of the processors, which are exchanged among the processors, together with the execution results of the processors. The system synchronizes the execution results based on the common signal and the phase information pieces.
Application of the technology disclosed in Patent Literature 2 realizes the tracing with reference to the same timeline, even if each processing unit operates at a clock signal with a different operating frequency. That is, by providing each processing unit with a counter that outputs a different value depending on the operation clock, and correcting the output values from the counters by using the common signal so as to incorporate the pieces of phase information (i.e., time information) into the output values, the technology realizes the tracing with reference to the same timeline indicated by each counter.
There also is a conventional method for obtaining a trace information piece for each of the processors (i.e., processing units) included in a computer system structured on a simulator apparatus (See Patent Literature 3).
According to this method, each processor performs processing in synchronization with the clock individually provided therein, and gives its operational information piece to the other processors in real time, and combines and holds its operational information piece and the operational information pieces of the other processors. Each processor compares its operational information piece contained in the trace information piece held therein with operational information pieces contained in the trace information pieces held in the other processors, and keeps the consistency among the trace information pieces held by the processors, in terms of time.
In a simulator apparatus using the trace information acquisition method, a plurality of processors, a trace editing unit 1720, and a display unit 1721 are structured on the simulator apparatus, as shown in FIG. 17. In the example shown in FIG. 17, there are two processors, namely processors 1701 and 1711. The trace editing unit 1720 edits trace information pieces 1709 and 1719 output from the processors 1701 and 1711, respectively. The display unit 1721 displays the trace information pieces edited by the trace editing unit 1720.
The processors 1701 and 1711 include instruction execution simulators 1704 and 1714, transmitters 1707 and 1717, receivers 1706 and 1716, and tracing units 1702 and 1712, respectively. The instruction execution simulators 1704 and 1714 execute programs 1705 and 1715 under test, respectively. The transmitters 1707 and 1717 transmit the operational information pieces, resulting from the execution of the programs 1705 and 1715 under test on the instruction execution simulators 1704 and 1714, to the processors 1711 and 1701, respectively, via a communication unit 1710. The receivers 1706 and 1716 receive the operational information pieces from the transmitters 1707 and 1717, respectively, via the communication unit 1710. The tracing units 1702 and 1712 trace the operational information pieces received by the receivers 1706 and 1716 by using clocks 1703 and 1713, respectively.
The trace editing unit 1720 shown in FIG. 17 compares the operational information pieces contained in the trace information pieces of the processors 1701 and 1711 with each other. Thus, at the location where the operational information piece of the processor 1701, which is contained in the trace information piece held by the processor 1711, matches the operational information piece contained in the trace information piece held by the processor 1701 itself, the trace editing unit 1720 interfaces with the trace information piece held in the processor 1701 and the trace information piece held in the processor 1711 consistent in terms of time.
In some cases, the pieces of trace information contain identical or similar operational information pieces. If this is the case, it can be difficult to specify the operational information piece that is to be used as reference information. In view of this, the trace information acquisition method shown in FIG. 17 uses random number generators 1708 and 1718 each adding a random number to the operational information piece contained in the corresponding trace information piece. This structure distinguishes the identical or similar operational information pieces from each other.