The invention disclosed herein relates generally to network monitoring systems. More particularly, the present invention relates to improved methods and systems for efficiently storing event data in a database and distributing the event data to different users, where the event data relates to events occurring on a computer network.
Maintaining the proper operation of services provided over a network is usually an important but difficult task. Service administrators are often called upon to react to a service failure by identifying the problem that caused the failure and then taking steps to correct the problem. The expense of service downtime, the limited supply of network engineers, and the competitive nature of today's marketplace have forced service providers to rely more and more heavily of software tools to keep their networks operating at peak efficiency and to deliver contracted service levels to an expanding customer base. Accordingly, it has become vital that these software tools be able to manage and monitor a network as efficiently as possible.
A number of tools are available to assist administrators in completing these tasks. One example is the NETCOOL® suite of applications available from Micromuse Inc. of San Francisco, Calif. which allows network administrators to monitor activity on networks such as wired and wireless voice communication networks, intranets, wide area networks, or the Internet. The NETCOOL® suite includes probes and monitors which log and collect network event data, including network occurrences such as alerts, alarms, or other faults, and store the event data in a database on a server. The system then reports the event data to network administrators in graphical and text based formats in accordance with particular requests made by the administrators. Administrators are thus able to observe desired network events on a real-time basis and respond to them more quickly. The NETCOOL® software allows administrators to request event data summarized according to a desired metric or formula, and further allows administrators to select filters in order to custom design their own service views and service reports.
In a demanding environment, there are many tens or even hundreds of clients viewing essentially the same filtered or summarized event data. Moreover, in a large network there are thousands of devices being monitored at a number of geographically distant locations, and events occur on these devices with great frequency. As a result, the databases that store event data have become very large and are constantly being updated. Newly incoming event data is delayed before being stored or processed at the database, e.g., during a period when the databases are locked. Even if such delays are for a fraction of a second or a few seconds, this may impair the ability of the administrator clients to receive event data in a timely manner. These and related issues become exacerbated as the size of a network increases, thus limiting the scalability of the network management system.
Accordingly, there is a need for improvements in how such network event databases are updated with events and how they are managed to provide greater scalability and efficiency. Furthermore, there is a need for improved techniques for efficiently coordinating the processing of event data obtained from both local and remote networks.