Applications like e-commerce or social media applications are creating unparalleled amounts of data, describing activities performed by their users. Such activities may contain purchase of specific products, expression of sympathy to someone/something or update of social relationships. The amount of created usage activity data measures in terabytes per day for large applications. Valuable information can be extracted from this activity data by relatively simple analysis processes. As an example, a web based video-on-demand company may analyze the user activity data to gain information about the popularity of different videos and movies and may optimize its offerings based on this information.
Massive parallel data processing approaches may be used to subdivide large amounts of data into smaller partitions, perform the analysis tasks on all of those smaller tasks in parallel to get partial results and then combine those partial results to get a global result.
Software components like the Hadoop™ framework developed by the Apache foundation provide processing infrastructure to efficiently manage and execute such massive parallel analysis tasks. However, they lack sufficient monitoring facilities to monitor resource utilization of different individual jobs.
Available, traditional process based resource utilization monitoring systems provide measurements that allow the monitor to evaluate the e.g. CPU and memory utilization of specific processes, but they lack providing the job specific context information that allows to identify specific job executions responsible for resource utilizations. Knowledge about the process resources used by different jobs is an essential precondition for job optimizations. Only this knowledge allows to e.g. optimize those jobs that consume the most resources.
Transaction tracing and monitoring systems provide performance measurement data on code level that allows to identify code segments causing performance problems occurred during the execution of specific, individual transactions. Although this information is of high value to identify performance issues in conventional applications, the provided information is too fine grained to be used as a starting point for the analysis of a performance or resource utilization problem in a massive parallel processing environment.
Consequently, a solution is desired that allows easy identification and diagnosis of performance and resource utilization problems in massive parallel job processing environments.
This section provides background information related to the present disclosure which is not necessarily prior art.