Distributed applications have become increasingly popular in the last years, particularly following the widespread diffusion of the Internet. In a distributed application, client computers exploit services offered by server computers across a network. The distributed application can be mirrored on multiple servers, which are grouped into a cluster. The cluster acts as a single logical entity, wherein each request received from the outside is automatically forwarded to the server in the cluster that is best suited to its handling. The clustering techniques provide high availability and parallel processing; moreover, load-balancing algorithms can be used in an attempt to optimize the distribution of the workload across the servers.
Tools for monitoring performance of distributed applications play a key role in their management. Particularly, a system administrator can get instantaneous notifications when a client is experiencing any problem (so that appropriate steps can be taken to remedy the situation); alternatively, the collected information can be logged and accurate counts tracked over time. For example, the information provided by the monitoring tools is essential for service level agreements or for threshold and/or availability control; moreover, the same information is very useful to measure the workload for capacity planning and charge-back accounting.
However, the solutions known in the art require that the monitoring tool should be installed and started individually on each server wherein the distributed application runs. This is a problem in high-dynamic environments, wherein the arrangement of the distributed application changes continually (so that very frequent interventions are necessary to keep the monitoring tool abreast of the configuration of the distributed application).
The problem is exacerbated when the configuration of the distributed application changes at run-time; for example, in a cluster the distributed application can be started and stopped on specific servers, according to the current workload of the cluster. In this condition, it is impossible to establish a priori the servers wherein the distributed application runs.
Therefore, the monitoring tool must be always running on all the servers of the cluster. However, the monitoring tool wastes processing resources, and then can adversely affect the performance of any other application running on the servers wherein the monitoring tool would not be necessary. As a consequence, the monitoring tool can be detrimental to the overall performance of the cluster, resulting in application delays and system overload.
Moreover, all the resource management infrastructures that can be used to control the above-described system (including the distributed application and the monitoring tool) are based on an enforcement model. In this model, the configuration of the system is entirely controlled by an authority residing at a central site. The authority defines a desired configuration of each entity included in the system. For this purpose, the authority accesses a central repository storing the (alleged) current configuration of each entity, and determines the management actions required to bring the entity to the desired configuration starting from its current configuration. The management actions are then enforced remotely by the authority on the entity (which is totally passive).
A drawback of the resource management infrastructures known in the art is the lack of any kind of cooperation between the authority and the entities. This lack of cooperation may lead to inconsistencies when the entities are upgraded out of the control of the authority. Moreover, in the solutions currently employed the management of entities that are temporarily unavailable or off-line is quite difficult to implement. The known resource management infrastructures also require the authority to maintain information about the location of all the entities; at the same time, the authority must handle the communication with every entity directly. The above-mentioned drawbacks strongly increase the difficulty of correctly defining a solution to be deployed in the system for implementing the monitoring of the distributed application.