As the capabilities of computer processors and data storage devices have expanded in recent years, the data sets which are being processed have also increased dramatically in size. Despite rapid improvements, the performance and capacity of a system are often limited by processor speed.
One solution to such limitations has been to divide processing tasks into pieces and have multiple processors operate simultaneously upon those pieces. This approach reduces the burden on each processor and allows tasks which are not interdependent to be performed at the same time, or in parallel. A system configured in this way, with multiple processors operating in parallel, is called a parallel processing system. For purposes of this discussion, parallel processing systems include any configuration of computer systems using multiple central processing units (CPUs), either local (e.g., multiprocessor systems such as SMP or MPP computers), or locally distributed (e.g., multiple processors coupled via LAN or WAN networks), or any combination thereof.
Parallel processing allows an application to perform the same overall task more quickly. Alternately, an application on a parallel processing system can process more data in the same amount of time than on a single processor system. With the improvements in performance and capacity parallel processing allows, there is a need to evaluate those characteristics when running an application on a parallel processing system, either in place or proposed, or the effect of variation in the amount of data on performance of an application running on a particular parallel processing system.
In order to help analyze the performance of an application on a system, a model is used which represents the components of the application and the system and their interaction with supplied data sets. Because of the generally linear nature of data flow through a computer system, graphs have been used to describe these systems. The vertices in a graph represent either data files or processes, and the links or "edges" in the graph indicating that data produced in one stage of processing is used in another.
The same type of graphic representation may be used to describe applications on parallel processing systems. Again, the graphs will be composed of data, processes, and edges or links. In this case, the representation captures not only the flow of data between processing steps, but also the flow of data from one processing node to another. Furthermore, by replicating elements of the graph (e.g., files, processes, edges), it is possible to represent the parallelism in a system.
FIG. 1 shows an example of a graph describing an application on a parallel processing system with three processors. The application performs essentially two tasks: a transformation and a sort. Initially, the data is divided into three sections, represented by a vertex INPUT PARTITION. Each section is sent to one of three different processors, as represented by three links 10, 11, and 12. Each of the three processors performs one or more tasks on a corresponding data section. This allocation of tasks is represented by three sets of vertices, TRANSFORM 1 and SORT 1, TRANSFORM 2 and SORT 2, and TRANSFORM 3 and SORT 3. That is, the first processor performs a transformation and then sorts its section of data (TRANSFORM 1 and SORT 1) and so on. At the end, the data is aggregated together and output as a unified whole, represented by a vertex OUTPUT AGGREGATION.
It is very difficult, however, to predict and model the performance of applications on parallel processing systems. As with single processor systems, these predictions depend on the amount of data which must be processed and the resources required for that processing. However, in addition to information about CPU processing speeds and requirements, data set sizes, memory utilization, and disk usage, information about effective communication rates between processors and network performance becomes necessary. With multiple processors acting in parallel, possibly at different speeds and on different amounts of data, and interconnected by channels or links having different rates, the computations can become quite complex.
For example, many parallel system configurations are purchased based on the size of the database to be processed. An arbitrary rule of thumb is typically then applied to compute the number of processors required based on a ratio of processors to gigabytes of disk storage. This kind of over-simplification often results in systems which are wildly out of balance in the amount of processing or networking bandwidth required based on the actual computation that is required.
Accordingly, the inventor has determined that it would be desirable to be able to analyze the performance of an application executing on a parallel processing system. It would also be desirable to be able to estimate such performance based on assumed data set sizes and variations of the architecture of a parallel processing system. The present invention provides such abilities.