Computer systems and devices make extensive use of logs to collect information regarding computer system operation. Log information can be used for a variety of purposes including accounting, troubleshooting, and various types of monitoring including security-related monitoring. For example, security information and event management (SIEM) systems are known that receive logs generated by devices such as servers, network devices, etc., and use the information in the logs to assess system operation from a security perspective.
As will be appreciated, due to the potential of having to handle a large number of log messages, many systems are configured to automatically categorize these respective log messages into categories such that the events that generated these messages can be drawn to the attention of an appropriate administrator in a timely fashion. One traditional approach to processing and categorizing log messages includes utilizing customized parsers that are aware of the format and the structure of log messages generated by each device or process run by the device. There are, however, multiple problems with such an approach. First, it requires exact knowledge of the specification of each log message from each vendor. If the specification of the log message changes, the parser that processes those messages will also have to change. This approach is expensive and not scalable as the number of parsers (or the complexity of a single parser that accomplishes the job) grows linearly with the number of devices that produce the log messages. In addition, some vendors might not export log message specifications resulting in a customized parser being made based on observed messages with no guarantee  as to whether log messages will be processed and categorized correctly.
An alternative way of processing and categorizing log messages into categories is to derive a set of rules and regular expressions that match log messages to categories based on what specific rules are triggered or expressions satisfied. The problem with this approach is that the system itself will become exceedingly complex when it has to satisfy a large but realistic set of categories. Such a complex system will be difficult to maintain, extend, and adapt to new categories and to log messages that do not conform to the existing patterns and rules.
There is, therefore, a need to address at least some of the above identified problems. 