Networks for linking computer and communication systems have become important to many modern enterprises and service providers. Networks have become widespread and standards and protocols have been developed and widely adopted for their management. Careful management promotes networking efficiency, reliability, and manageability. Large modern networks link many network devices, such as routers and switches, and have complex topologies and other operational features to deliver complex services. To be successful, management approaches and related systems must deal with the size and complexity of networks.
Among the protocols developed for managing networks are the Simple Network Management Protocol (SNMP) and the System Log (Syslog) protocol. The Syslog protocol provides a transport mechanism for promulgating event log messages across networks, including Internet Protocol (IP) based networks. Syslog servers receive such event log messages in a Syslog based and managed network and become Syslog event collectors. Syslog events are typically sent on many different occasions, including at the start or termination of a process, when unexpected changes occur, or when certain communication related events occur.
Such processes include for instance actions of a network device, the function of an operating system (OS) therein, the function of an application therein, and the like. Logs generated by the OSs and network devices provide a record of activity of these devices, which can be useful for statistical and other informational purposes, as well as for management, e.g., backup and recovery, such as from device crashes, failures, network outages, etc. Such logs are typically written by a network device's OS or another control program, according to the purpose intended by the device vendor or OS developer.
Logs can record a variety of data relating to a network device. Such data can include, for example, error and status messages, certain network transaction details, incoming dialogs, and start/stop activities of certain routines. With respect to system logs as relate to the Syslog protocol, their content format may be standardized under the Universal Logging Protocol (ULP).
Syslog events relate to such logs. Error logs, for instance, can record the date and time of occurrence for an error event affecting a server of the Syslog based network. The corresponding event would identify the server, describe a particular component locus for the error, classify the type of error, provide a code for the severity of the error, for instance as it relates to continuing component device and/or network operation and reliability, and a message body descriptive of the error. Syslog events comprise a few well defined fields with the purpose of event processing, and a message typically describing Syslog reports used for network management. Messages are usually in a plain English, human readable form.
Such messages reflect the interest of network management and operating entities in obtaining accurate and in-time information relating to the status of network devices. These management entities base their management functions such as network administration, at least in part, on such Syslog events. Some such network administration functions, for instance billing, can demand a high degree of precision and consistency, including differential processing of Syslog events according to their relevance.
However, inconsistencies have occurred in billing and other network administrative and other management functions based on such Syslog events. Such inconsistencies can become problematic for some network providers and other network management and operating entities. For instance, inconsistencies can occur in a network provider's billing system related to Syslog reports, and such inconsistencies can result in billing disputes, which may be particularly likely to arise following outages and other periods characterized by suboptimal network performance and/or availability. Two factors to which such Syslog event based inconsistencies can be attributed include delay in processing Syslog events and information loss therefrom.
Given the size and complexity of modern networks, as well as the large amount of traffic and various modes of operation thereon, network devices such as routers, servers and switches can have significantly high activity levels. Syslog based networks may generate an extraordinarily large volume of logs and related Syslog events.
Some delays in processing Syslog events have been attributed to time used to perform event processing through an Internet Operational Support System (OSS). The time for performing event processing through an OSS can vary and may become significant for business purposes.
For instance, without a specific instruction or other impetus for the OSS to expedite a particular Syslog report, for instance one having a heightened criticality relative to other processing tasks, the OSS may possibly delay processing that Syslog events report for those other processing tasks (e.g., based on other priorities, perhaps not as significant).
Some information loss from Syslog reports has been attributed to excessive processing of Syslog event messages, including the overwriting of data thereon. Overwriting data on an original Syslog report typically occurs in processing related to handling that report, such as to classify an event to which the Syslog report relates (e.g., to mark the report for special subsequent handling). Some such overwriting of original Syslog report data can be attributed to efforts at efficient management processing taken to cope with the veritable flood of Syslog events that can characterize a modern network, for instance, a large, complex and heavily trafficked one.
Efforts at efficient management processing taken to cope with this flood have included aggressive filtering, aggregation, abstraction, and data distillation (which are typically achieved by discarding information). However, inadvertent loss of important information has occurred from Syslog events through such aggressive processing. For instance, where an OSS systems integrator or application developer aggressively processes a Syslog event, it may do so with imperfect (e.g., acontextual) realization of a particular significance of the carried message, or perhaps of a certain piece of information therein.
Without a specific instruction or other impetus for the OSS to process Syslog events' messages without corrupting data therein, problems can arise. One example relates to an original Syslog event, which provided an accurate description of the status of a particular network device. This exemplary Syslog event was then processed by a user's OSS components for special handling (e.g., to avoid its early clearing, so as to preserve the report for an audit). In so handling the original Syslog event, the accurate descriptive data relating to the status of that particular network device was overwritten, leading to erroneous status indication for the network device. The inconsistencies could have led to a billing dispute.
Issues contributing to such management problems in processing Syslog events include the partial correlation between Syslog events and SNMP informs/traps performed at a component level of a NMS, and processing tasks such as parsing, correlation, and the like, with which the Syslog events are processed. Considering the first issue, defining SNMP notifications, such as traps and informs, follows a structured information model (SMI). A network device may send management related information, including SNMP informs or Syslog events, through a Syslog interface and/or an SNMP interface. Through each interface, these data respectively report the same behavior, for instance, the same fault causes, the same performance violations, etc. Related conversion and filtering processes can inadvertently lose information conveyed by Syslog events.
With multiple interfaces operating, it can be important to combine and correlate events reporting status that come from either or both interfaces. One conventional solution is to identify the set of Syslog events corresponding to a formally identified SNMP notification. However, such operations typically drop some information reported by Syslog events, which corresponds to the data loss discussed above. Additionally, the time required to execute these conventional processing and other operations can be produce reporting delays and cause delays in providing feedback to the devices, important for its management.
Further, as an extremely large number of Syslog events may typically be issued in some networks, differentiating among them can become important. Such differentiating between the myriad Syslog events allows for applying appropriate correlation mechanisms, saving management resources (e.g., processing, memory, power use, etc.) needed to process them, and timely reporting for a particularly identified cause (e.g., of the event).
Conventionally, such differentiation is performed with filtering Syslog events based on well defined fields or with local semantics of parsed plain English text. However, filters for achieving this differentiation are typically defined according to a management application target. Thus, a particular guideline coming from the device that originated the Syslog event, for instance one significant to the device or its network by its design or character, can be obfuscated, lost, or potentially neglected.
Due in part to their great number, Syslog events are typically not passed on directly to users. Instead, processing with various network management applications is generally required. It is an automated application that ultimately decides which information is passed on to users and which is not. In some conventional applications, the application not only decides which information is passed on to users, but even which information should be logged and preserved, as much information will simply be thrown away and not be kept at all.
Such automation can inadvertently eliminate and thus lose information from a Syslog event. For instance, some conventional applications tend to discard information that they do not understand or that has not been explicitly declared important to the application. Thus, while conventional applications may take proper action on information that they know to be important, they are not aggressive, e.g., they are not conservative on the side of information that they do not understand. Some such applications typically err on the side of throwing out too much rather than too little information.
Syslog notifications typically allow specification of event significance, e.g., the severity of an underlying actual event from the perspective of network operations, management, etc. However, for conventional network management applications, severity does not provide a sole, suitable criterion as to whether a Syslog event should be reported. For instance, a particular message may be redundant and/or subjected to filtering activities.
Delay conventionally expected or imposed in considering a Syslog event for processing can exceed a particular time period significant in guaranteeing the accuracy of the reported cause/state, within an interval needed for appropriate feedback or other corresponding action. The conventional ease with which Syslog events define free form format events can complicate their correlation with SNMP notifications. This can cause significant information conveyed by a Syslog event, intended for instance for action by a human operator, to be lost when Syslog aggregation, Syslog to SNMP notification translations, Syslog to SNMP correlation, or other processing functions are performed.
Thus, while a dense and rich amount of significant information may originally be captured by a Syslog event, this information can be lost, destroyed, or delayed from timely reporting, due to aggregation, translation, correlation, and/or other management processes. This can be particularly troublesome when information relating to temporary/transient faults is to be captured in a Syslog event. Where the original information in such a Syslog event is lost, the actual device/network-related event manifestation is not longer visible, e.g., no longer occurring, on account of its transience or temporary nature; yet its effect on network management may have been significant. Thus, the loss of its Syslog data can extinguish its records, which can lead to management related issues, such as administrative disputes, financial consequences, and the like.
Thus, delay in processing Syslog events and information loss therefrom can pose significant and problematic challenges in efficiently and reliably managing a network. Concomitant failure to provide timely response to critical Syslog reports can be inefficient and can have significant consequences in promoting network reliability, such as where they relate to device inoperability, link availability, and/or network outages. Data loss from Syslog events can lead to inconsistencies, which can be problematic for network administration and other network management functions.
There are cases where Syslog events must be processed and conveyed with particular care in order to avoid delays in reporting significant underlying actual network events and to help keep management delays reasonable, for instance, to allow for timely feedback actions. Conventional Syslog events are not processed by network management entities based on the relevance of the event itself and on particular processing needs.
Syslog events are used to convey information relating to a wide variety of network occurrences. The size and complexity of networks can lead to potentially overwhelming volumes of Syslog events. Conventional management applications attempt to handle this volume by filtering, correlation, preprocessing, and/or other processes. However, important Syslog events can be inadvertently and unrecoverably lost in these processes. Sometimes, such Syslog events are so lost because process developers are unaware of the significance of certain messages relating to the Syslog events.