A parallel processing environment includes a plurality of processors that cooperate through hardware and software mechanisms to distribute processing and memory load (load balance) amongst the processors of that environment. Such an architecture permits operations to complete more rapidly and more efficiently.
A variety of problems can arise, which may adversely impact the efficiency of the parallel processing environment. One such problem occurs when one processor is more heavily loaded then another processor or when one processor is more heavily loaded than the remaining processors. Conventionally, the trick has been to adequately determine when this type of problem actually occurs or when this type of load balancing problem is actually a true problem situation. This is so, because parallel processing environments are dynamic where conditions change rapidly and frequently.
One solution has been to find the lowest loaded processor and the highest loaded processor and if the differences in load exceed some comparison load value or percentage, the load balancing problem is considered to be present. Yet, in this case it may just be that of 100 available processors within the parallel processing environment one of the processors has little or no work to do, such that there is really not a load balancing problem.
Another solution attempts to detect load balancing issues using statistics accumulated from the start of an operation. Consequently, if a true problem occurs in the middle of the operation it may be delayed or go undetected because the magnitude of the imbalance may be reduced by the accumulated statistics. In other words, the load balancing detection algorithm is more heavily influenced by whether skew calculations use accumulated statistics from the start of an operation or snapshot statistics collected periodically during the operation.
In still another solution, the attempt to detect the load balancing problem occurs using a wall clock. That is, time for the problem to be present before it is considered a true problem is based on elapsed time. However, the effectiveness of using wall clock may be reduced on a busy multi-user system where an operation might not have a chance to run again during the elapsed time.
Thus, it can be seen that improved techniques for load balance detection within a parallel processing environment are desirable.