A massively parallel processing system (“MPP system”) is a computer system with many independent arithmetic units or entire microprocessors running in parallel. The system may incorporate hundreds or thousands of central processing units (“CPUs”) working together.
A query processed on the system may be broken up and distributed over more than one CPU for processing. In almost all cases, the processing times for the individual CPUs working on the query will not be exactly equal. Skew occurs when one processor is performing more work than the others.
The higher the variation between processors performing the least and most amounts of work, the greater the degree of skewing. Too much skew can significantly impair the efficiency of the system. When one CPU is working on a query for a disproportionate amount of time, other queries waiting to be processed will be backed up.
Conventionally, acceptable skew is determined by hand based on analyst experience.
For a given query, the associated information regarding processing history, such as total CPU time expended, and average CPU time per processor are stored in computer logs. Conventionally, this information is analyzed by hand to determine if the amount of skew for the query is acceptable, and if not, whether the query is a candidate for tuning. This approach is time consuming and requires analysts experienced in assessing the query data.
It would be desirable, therefore, to provide apparatus and methods for electronically identifying unacceptably skewed queries processed in an MPP system.