Distributed systems are scalable systems that are utilized in various situations, including those environments which require a high throughput of work, or continuous or nearly continuous availability of the system.
A distributed system that has the capability of sharing resources is referred to as a cluster. There is a need to keep track of information regarding jobs in a queue and job scheduling processes in the cluster. This information typically comes from different commands at different levels of detail. Further, this information is extremely dynamic and volatile.
Various discrete job monitoring commands exist today, such as the llq & llstatus commands of the System Testing LoadLeveler product for AIX, described in “IBM LoadLeveler for AIX, Using and Administering”, Version 2, Release 1, publication no. SA22-7311-00 (October 1998) and “IBM Parallel System Support Programs for AIX: Command And Technical Reference”, Version 3, Release 1, publication no. SA22-7351-00 (October 1998) which are hereby incorporated herein by reference in their entirety.
In view of the discrete nature of these existing monitoring commands, enhanced job monitoring techniques are desired, to facilitate, for example, the administration of cluster systems.