A server farm can be defined generally as a group of networked servers or, alternatively, a networked multi-processor computing environment in which work is distributed between multiple processors. A server farm provides for more efficient processing by distributing the workload between individual components or processors of the farm and expedites execution of computing processes by utilizing the power of multiple processors. The networked servers constituent to a server farm are typically housed in a single location, however, they can be geographically dispersed such as in grid computing, which can be thought of as distributed and large-scale cluster computing and as a form of network-distributed parallel processing. Grid computing can be confined to a network of computer workstations within a company or it can be a public collaboration sometimes referred to as a form of peer-to-peer computing.
Often, a server farm environment includes many different classes of resources, machine types and architectures, operating systems, storage facilities and specialized hardware. Server farms are typically coupled with a layer of load-balancing software to perform numerous tasks, such as tracking processing demand, selecting machines on which to run a given task or process, and prioritizing and scheduling tasks for execution. Other terms used for load-balancing include load sharing and distributed resource management (DRM). In general, DRM applications are used to manage the resources associated with a server farm. One example of a commercially available distributed resource management application is Platform LSF 5 available from Platform Computing Inc.
Combining the processing power of servers into a single computing entity has been relatively common for years in the areas of research and academia. However, companies are increasingly utilizing server farms to efficiently perform the vast amount of task and service computing that they encounter in their respective businesses. For example, development of large-scale software platforms can benefit from use of networked multi-processor computing for repetitive processes associated with compiling, releasing and testing of software code.
Prior approaches to using a server farm for compute-intensive software development tasks operate by executing many small programs, or scripts, to perform numerous functions, including the following: (1) establish run-time environments for executable task commands; (2) execute task commands to perform actual work, such as compile, release and test; (3) coordinate the execution and interdependencies of various task commands (e.g., high-level processes to coordinate low-level processes); and (4) generate reports regarding the execution of the task commands. Jobs typically implement the myriad of interwoven tasks/processes that perform the work. Often, developers within a working group of a company might create scripts for performing desired functions which are specifically tailored to group-specific operations, goals, computing platforms, etc. In practice, the processes that the developers use to complete their work functions are often not written down and much of the operational set-up involves manual processes.
Additionally, different working groups within a single company, and even within a single company location, often procure, maintain and administer their respective computing environments and platforms separately and independently from other groups within the company. In such a scenario, a machine going off-line can result in having to modify many scripts that were tailored to that machine, platform, or environment. Not only does this manner of operating contribute to a waste of resources, such as unused processor capacity, but also the task of managing the large number of scripts and the computing resources on which the scripts run becomes a non-trivial, highly complex effort.
Based on the foregoing, it is clearly desirable to provide a mechanism for managing the parallel execution of processes, including interdependent processes, in a networked multi-processor computing environment. Furthermore, it is clearly desirable to provide a mechanism for managing runtime execution environments for processes executing in a multi-processor computing environment. There are more specific needs associated with the foregoing needs, which include formalizing processes with respect to executing work across multiple processors whereby the type of platform on which the work runs is transparent to a user, and for providing a common control and management layer on which users can define and run their work.
Computing platforms typically generate log files detailing various runtime and termination statistics associated with the execution of a command, task, job, process, or the like. Historically, a single log file (e.g., a “flat file”) is generated for a grouping of executable tasks, jobs, etc., that are run together as a unit of work on a computer or a networked multi-processor computing environment. In the context of complex, interdependent software development tasks running together as a unit of work, the number of different tasks that perform the work can be enormous. Since runtime and termination statistics are usually generated for each executable unit of work, a corresponding log file can likewise be enormous. Consequently, such a log file is not easy to analyze and to glean information from. Analysis of a very large log file (possibly thousands of lines of text) typically requires manual parsing or filtering or the like, to find the relevant information of interest to a user. Such a manual process is not an efficient use of time and resources.
Hence, based on the foregoing, there is a clear need for a mechanism for providing log information related to processing in a more orderly and useful manner than in prior approaches. A more specific need exists for providing a logging mechanism that overcomes the shortcomings associated with prior approaches by facilitating rapidly locating information of interest.