In order to improve performance of computer code used in parallel processing computer systems, it is often necessary to determine and display certain data that is indicative of code execution. Generally, computer program execution monitoring systems, hereinafter referred to as monitoring systems, monitor computer programs as they are executed on parallel nodes or a parallel processing computer. The monitoring system subsequently generates data indicative of various functions performed by each node during program execution and stores this data in a mass storage memory device.
The monitoring system organizes the stored data into a historical file known as a trace file; the data within the trace file is known as trace data. Trace data are generally a time-ordered series of recorded indicia representing changes to the information and/or control state of the parallel processing computer. Therefore, trace files are used both for real time debugging of program and system elements as well as for studies
Each individual datum within the trace data is known as an event. Typically, an individual event contains a number of data fields. The type of event being recorded defines the number of fields in a given event and the information contained in each field. For example, a monitoring system produces a "send" event whenever one processor within the parallel processing computer passes information to another processor within the same computer. Another example is "receive" event, produced whenever a processor in the parallel processing environment receives a message from another processor.
An event may contain several information such as a time field indicating when the event was recorded by the monitoring system, an event processor field identifying the processor to which the message is directed, a start time field indicating a time that the message was sent or received, and a stop time field indicating the completion time of the event.
Those skilled in the art appreciate that the production and use of trace data is well known in the art and does not require further discussion since it is not per se necessary to understand the workings of the present invention.
As explained earlier, trace files are used either to study the system or as a debugging tool. Typically, a parallel processing computer in cooperation with a monitoring system displays trace data as a single, real time display while the computer executes a parallel program. The data is then stored via a direct access storage device (DASD) or other such storage devices for future use and analysis. Therefore, when the trace data is later sorted, errors and anomalies that occurred during program execution can be corrected.
Similarly by analyzing the trace data stored, a study of processor utilization can be made to optimize such use. One such data processing and display method is enclosed in U.S. Pat. No. 5,168,554 entitled "Converting Trace Data From Processor Executing In Parallel Into Graphical Form", issued to Charles A. Luke on Dec. 1, 1992 (herein after referred to as the Luke patent). Specifically, the Luke patent discloses a method of creating a "time process diagram" that depicts processor utilization during execution of a parallel program. The method includes searching previously recorded trace data for specific types of events, especially those events that indicate processor utilization, and generating a table of those events. The events in the table are arranged in a time sequential manner. From the table, the method disclosed in the Luke patent produces either a diagram or display showing a total number of processors operating during a particular time interval or a diagram showing specific event activities which occurred during that particular time interval. The user can scroll forward and backward within either diagram to display a different time interval than previously displayed. From these displays, a programmer can alter the parallel program to optimally execute on a particular parallel processing computer.
Inasmuch as conventional display apparatus known in the art typically concurrently displays trace data from many processors in a single display format, a programmer faces a tedious, burdensome and often confusing task of simultaneously visualizing relative performance data for more than a small number and comparing the trace data therefrom. Traditionally, the trace data displays portray the data in a textual format only, or in a mixed textual and graphical format. However, even in instances such as in Luke patent where non-textual data displays are available there are no available data portraying a detailed presentation of each processor's functioning as would be necessary in real time debugging. For example in Luke since the main concern is system utilization optimization, no emphasis is placed on detail visual displays of single processor workings. Moreover, the art presently does not provide methods of generating multiple and simultaneous displays of various trace data display formats. In addition in the present art there are no methods or apparatus that can show why a particular processor took more than average time for a particular task or a method that can flag possible on-going problems. Many display devices do not even calculate what an average time for task processing of a certain nature should be, but more importantly these devices do not clearly show the existence of such occurrences such as interrupts as part of their utilization diagrams.
Consequently, in an application program that extends over a relatively large number of processors, reviewing and analyzing the trace data can simply be too daunting to be practically accomplished by even an experienced programmer. Trace files generated by parallel processes contain information about dozens or even hundreds of processors. A visualization tool must be created that can handle any reasonable number of processors in a way that images projected are neither overwritten nor scaled down to such small size that they are un-interpretable by the user. The displays must also allow the user to make quantitative comparisons between the data of different processors and draw attention to display of statistically anomalous processor activity. Thus a need exists in the art for improved systems and methods that can afford a user the ability to quickly access, easily review and understand trace file data, as well as determine and highlight any problems with each individual processor.
Visual presentation devices in existence today do not provide a developer means of easily identifying glitches or bottlenecks in the application. For example, application developers are forced to make inferences concerning the relationships between various types of data that are presented in order to understand how to improve execution of the application program. Similarly, system activity caused by the application under inspection as well as system activity unrelated to the application is not identified by current displays. The user needs to be able to determine where their program is being interrupted from doing work or when their program could be doing useful work instead of waiting. Furthermore, the application developer typically must accept the content and presentation of the data as it is presented to them by the program visualizer rather than being able to specify the types of data that get displayed, and the relationships between these types of data.
The teachings of the present invention are further established on the basic workings and concepts presented originally in U.S. Ser. No. 011,436 (attorney docket KI9-92055) and filed on Jan. 29, 1993, now abandoned. However, the present invention presents novel ideas and other improvements over the concepts previously presented in that abandoned application. In addition, this invention is being filed at the same time with another application, attorney docket KI9-94-004 pertaining to related subject matter.