1. Technical Field
The present invention relates generally to an improved distributed data processing system, and in particular, to a method and apparatus for monitoring entities in a distributed data processing system. Still more particularly, the present invention provides a method and apparatus for identifying and monitoring entities providing services in a network data processing system.
2. Description of Related Art
Modern computing technology has resulted in immensely complicated and ever-changing environments. One such environment is the Internet, which is also referred to as an “internetwork”. The Internet is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from a protocol of the sending network to a protocol used by the receiving network. When capitalized, the term “Internet” refers to the collection of networks and gateways that use the TCP/IP suite of protocols. Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply “the Web”. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). The information in various data files is formatted for presentation to a user by a standard page description language. The Internet also is widely used to transfer applications to users using browsers. Often times, users of software packages may search for and obtain updates to those software packages through the Internet.
Other types of complex network data processing systems include those created for facilitating work in large corporations. In many cases, these networks may span across regions in various worldwide locations. These complex networks also may use the Internet as part of a virtual private network for conducting business. These networks are further complicated by the need to manage and update software used within the network. Often times, interaction between different network data processing systems occurs to facilitate different transactions. These transactions may include, for example, purchasing and delivery of supplies, parts, and services. The transactions may occur within a single business or between different businesses.
Such environments are made up of many loosely-connected software components. These software components are also referred to as “entities”. In a modem complex network data processing system, innumerable situations exist in which a need arises to test or monitor the operation of another entity, such as, a particular running process or a particular service. Currently, a human operator must test and monitor the proper functioning of entities, such as important system services, to detect and correct faults and failures in these entities. In many cases, a service may depend on other services for its correct functioning. In this case, it is important to determine whether those other services are functioning correctly, in order to take steps or produce alerts when the services are not functioning correctly. For example, a purchasing entity used for ordering supplies may infrequently require a selected component from a particular provider. Although this component is needed infrequently, it is essential to be able to obtain the component quickly when the need arises. If the provider changes its inventory and no longer offers the component or if the order entity used at the provider to generate the order is unavailable, it is crucial for the purchasing entity to be able to locate another service. Currently, a human operator is required to identify a process to test the order entity to determine whether the order entity is functioning correctly. In this example, the order entity is functioning correctly if the order entity offers the selected component as being available in inventory. After identifying this process, the human operator must monitor the order entity.
Currently, the testing and monitoring of computing entities is performed primarily on an ad hoc basis. A human operator needing to monitor a particular service will write a monitoring program for that service or manually search for such a program that someone else has written to perform monitoring. The monitoring program will be deployed and configured manually, and the human operator will manually inspect its output. In some cases the human operator may wrap the monitoring program in a shell that will automatically take some action, such as restarting the service, when a problem is detected.
Existing maintenance and administration tools such as the IBM Tivoli Enterprise Console include features such as administration consoles that display the monitoring status and test results from a number of different entities, including detected faults and generated alerts, and allow administrators to specify actions that should be taken automatically when certain alerts occur. IBM Tivoli Enterprise Console is available from International Business Machines Corporation. Standards, such as the Simple Network Management Protocol (SNMP), specify well-documented ways of communicating alerts and other system events between entities. Some modem computing systems, both in hardware and in software, are designed with testability in mind, and in some cases either the original manufacturer or one or more third parties provide specific testing tools or algorithms for testing specific products.
Even with these types of maintenance and administration tools, a human operator is required to identify entities and methods that are to be used to monitor those entities. Such a system is time consuming and often may require extensive research to identify how a service is to be monitored. Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for identifying and monitoring entities providing services.