Computer systems utilized for business system generate messages which audit user access, service errors, and other critical information about the operation of the systems. Many such computer systems run the on the MICROSOFT WINDOWS operating system, which collects log message information into the Windows Event Log on the local file system of each computer system. (MICROSOFT and WINDOWS are trademarks of Redmond, Wash. based Microsoft Corporation.) Log messages in the WINDOWS Event Log are maintained in a binary format that is not documented in the public programming interface, and must first be translated into text using public programming interfaces in order to allow analysis and archiving. The message translation programming interface requires the Dynamic Link Libraries (DLLs) containing translation information and supplied with the generating application to be accessible on the computing system performing the translation.
In order to facilitate analysis and archiving of log messages from individual MICROSOFT WINDOWS operating system based computer systems, the WINDOWS Event Log and the application message translation DLLs must be collected to the collecting system and message translation performed. While the log messages held in the WINDOWS Event Log are accessible via MICROSOFT Remote Procedure Call (MSRPC), remotely accessing the application DLLs involved significant processing time and network bandwidth for each target computing system as they are copied to the collecting system.
As a result, current log message collection systems have significant barriers to scaling remote collection beyond several log message hosts, which limits the effectiveness of log message analysis and archiving for MICROSOFT WINDOWS computing systems in large-scale computing systems. Current approaches for remote log message collection from MICROSOFT WINDOWS operating system based computer systems are available in standalone packages or as part of larger system management software products. However, many of these approaches suffer from scalability limitations to the number of collected computing systems, even with less than 100 log message hosts. A leading stand-alone package is Lasso, which is an open source package made available by LogLogic, Inc. However, the Lasso package has been tested to consume several hours to start collection of only 50 MICROSOFT WINDOWS operating system based computer systems. Testing has revealed that much of this time is spent collecting the entire set of application DLLs available on each host system, before any log message collection can begin. On a network with limited bandwidth or unreliable connectivity between the collection and target computing systems, the time needed to load the DLLs may be longer.
A leading system management product supporting log message collection and analysis is MICROSOFT MOM system management suite. The MICROSOFT MOM System Management product suite includes the ability to remotely collect log messages from several host MICROSOFT WINDOWS computing systems. MICROSOFT has documented that this feature works for a very limited number of target computing systems for much the same reason as the Lasso package. The MICROSOFT MOM System Management may be effective only up to approximately 50 log message hosts.
A common approach involves initializing a cache of known DLLs either automatically or on-demand startup, and then continuing remote event collection and translating the collected log messages using the set of cached DLLs. In general, after startup, a log collection system may begin by selecting a host. Once the host has been selected, the log collection system loads the Dynamic Link Library (DLL) for that host and continues to load the DLLs for that host until all the DLLs for that host have been loaded. There may be thousands of DLLs associated with each host. The log collection system may continue selecting hosts and loading the DLLs associated with each host until all hosts have been selected and all DLLs for each host have been loaded. Once this portion of the log collection process is completed, the log collection systems begin fetching log messages from each of the hosts and translating the log messages received from the hosts. Once all the log messages have been received and translated, the log collection system may wait for the next poll interval.
This approach results in several non-optimal behaviors. A disadvantage is that all known DLLs are maintained in the cache of known DLLs, consuming significant amounts of space on the log message collection system, and ultimately limiting the number of remote WINDOWS hosts that can be supported. Also, after initialization, the DLL cache is not updated, which over time will result in untranslatable events. When new applications are installed on the target WINDOWS hosts, this becomes a significant problem. Collection startup using prior art approaches is delayed waiting for the initialization and/or verification of the DLL cache. With a large number of WINDOWS hosts, or a slow and/or unreliable network this causes a significant startup delay. Initialization of the DLL cache in prior art approaches becomes an expensive network operation on slow and/or unreliable network links, in many cases causing significant issues for other network traffic using these links.