1. Field of the Invention
The present invention relates in general to network monitoring, reporting, and asset tracking software systems, and more particularly to a a network monitoring system that implements a plurality of self-monitoring relays to provide a reliable store and forward mechanism.
2. Background
The need for effective and cost efficient monitoring and control of servers and their clients and computer network components, i.e., systems management, continues to grow at a rapid pace in all areas of commerce. There are many reasons system management solutions are adopted by companies including reducing customer and service downtime to improve customer service and staff and customer productivity, reducing computer and network costs, and reducing operating expenditures (including reducing support and maintenance staff needs). A recent computer industry study found that the average cost per hour of system downtime for companies was $90,000 with each company experiencing 9 or more hours of mission-critical system downtime per year. For these and other reasons, the market for system monitoring and management tools has increased dramatically and with this increased demand has come pressure for more effective and user-friendly tools and features.
There are a number of problems and limitations associated with existing system monitoring and management tools. Generally, these tools require that software and agents be resident on the monitored systems and network devices to collect configuration and operating data and to control communications among the monitored devices, control and monitoring consoles, and a central, remote service provider. Data collected on the monitored systems is displayed on the monitoring console or client node with tools providing alerts via visual displays, emails, and page messages upon the detection of an operating problem. While providing useful information to a client operator (e.g., self-monitoring by client personnel), these tools often require a relatively large amount of system memory and operating time (e.g., 1 to 2 percent of system or device processing time).
Additionally, many management systems are not readily scalable to allow the addition of large numbers of client or monitored systems. In many monitored networks or systems, intermediate or forwarding relays are provided between a monitoring service provider system and the monitored systems for transferring messages and data between the server and monitored systems. Presently, the forwarding relays are configured with memory and software to support a relatively small number of monitored systems, i.e., the ratio of monitored systems to relays is kept relatively small. With this arrangement, it is difficult to later add new monitored systems without modifying the hardware and/or software of the relays or without adding additional relays. Additionally, the volume of data and messages sent between monitored systems and the service provider server can vary significantly over time leading to congestion within the network and the delay or loss of important monitoring and control information.
Further, the volume of data and messages sent between monitored systems and the service provider server can vary significantly over time leading to congestion within the network and the delay or loss of important monitoring and control information. The number and size of the messages transferred between monitored systems and the service provider can be quite large to display collected data on the monitoring console or client node and to provide alerts via visual displays, emails, and page messages upon the detection of an operating problem. Data sent from a monitored system to the service provider needs to be transferred in a reliable, secure, and efficient manner.
A significant amount of effort has been spent to provide useful communication controls or protocols for managing the communication over public networks, such as the TCP/IP suite for the Internet, and these networks are typically used to link the service provider system and the customer environment or network. However, communication protocols for managing data transfers within a monitored customer environment have not been successfully developed or implemented in a computer system to meet the communication needs of both the customer and the service provider.
A communication protocol is a set of rules that governs the interaction of concurrent processes in distributed and linked systems. A communication protocol includes rules, formats, and procedures implemented between two communicating devices for initiation and termination of data exchanges, synchronization of senders and receivers, detection and correction of transmission errors, and formatting and encoding of data. Many communication protocols provide a virtual, full-duplex communication channel between similar protocol layers in linked devices. For example, the International Standards Organization (ISO) provides a seven layer protocol stack or hierarchy including, from lowest to highest layer: a physical layer, a data link layer, a network layer, a transport layer, a session layer, a presentation layer, and an application layer. Each layer in the stack defines a distinct service and implements a different protocol with higher layers building on or using the services provided by the lower layers. For example, the physical layer may implement a byte-stream protocol that includes functions or services required to transmit bits over a physical connection and defines whether the connection is copper wire, a coaxial cable, optical fiber, and the like. The data link layer uses the services of the physical layer by implementing a link-level protocol to create a reliable link adding services such as error handling and flow control. Higher layers such as the network layer (which may implement the well-known IP protocol) and the transport layer (which may implement the well-known TCP protocol) build on these two lower layers with the remaining higher layers building again on these layers.
A protocol designer may provide a new protocol for any of these layers by building on existing or known protocols, such as byte-stream protocols and the TCP/IP suite of network and transport protocols. In the customer environment of a monitoring service, there remains a need for a protocol such as a session layer protocol that coordinates and enhances communications between monitored devices, pipeline or network relays, and Internet interfaces or relays. Preferably, such a protocol defines communications between entities within a monitored customer environment in a space efficient manner that transfers monitoring service data, commands, and messages with less space or byte overhead. Also, the protocol preferably provides time efficient control with low time overhead with optional priority-based transfer of messages.
Another concern is the health of the various components of the monitoring system. This issue affects the scalability of the monitoring system. Each additional monitored system imposes an additional incremental load on the monitoring system. As the monitoring system “scales up” it is desirable to monitory the status of the monitoring system to ensure that it is not overloaded by the number of systems it is monitoring.