The present invention relates to management and maintenance of a computer network and, more particularly, to a method for processing information as logged from disparate devices attached to the network. Specifically, the method includes determining parsing rules for a large number of logs by reducing a set of log samples to a small set of unique patterns of the log samples.
Modern computer networks have a multi-layered security architecture including many security devices which ensure that servers, hosts, and applications running on the network are protected from harmful activity. The devices all generate voluminous logs that are difficult and time consuming to interpret. In order to have practical value from the logs, enterprises need to manage the deluge of data logged by these devices. Tracking network and security activity trends over time by manually scanning log files is difficult and time consuming.
Check Point Eventia Suite™ provides a security information and event management solution for enterprises looking to efficiently manage large volumes of data logged from disparate sources. Eventia Suite™ automates and centralizes security log data analysis and provides previously defined or custom reports.
Syslog is a standard for forwarding log messages in an IP network. The term “syslog” is often used for both the actual syslog protocol, as well as the application or library sending syslog messages.
The syslog protocol is a client-server type protocol: the syslog sender sends a small textual message (less than 1024 bytes) to the syslog receiver. The receiver is commonly called “syslogd”, “syslog daemon” or “syslog server”. Syslog messages can be sent via UDP and/or TCP. Syslog is typically used for computer system management and security auditing. Syslog is supported by a wide variety of devices and receivers across multiple platforms. Because of this, syslog can be used to integrate log data from many different types of systems into a central repository. {from http://en.wikipedia.org/wiki/Syslog}
In computing, a regular expression is a string that is used to describe or match text, according to certain syntax rules. The term “regular expression” as used herein is a string that is used to describe or match the alphanumeric and/or symbolic text of the log according to certain syntax rules. As an example, the string [0-9]+ is a regular expression representing one or more digits. Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on text patterns. As an example of regular expression syntax, the regular expression \bex can be used to search for all instances of the string “ex” that occur at word boundaries (signified by the \b). Thus in the string, “Texts for experts,” \bex matches the “ex” in “experts,” but not in “Texts” because the “ex” occurs inside the word there and not immediately after a word boundary. {from http://en.wikipedia.org/wiki/Regular_expressions}
In the prior art, syslogs parsing rules require highly specialized knowledge of the parsing rules and regular expressions—formulas that describe patterns in the syslog data. For example, a date has a regular expression as follows:{Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec} [0-3][0-9], [0-9][0-9][0-9][0-9]
In the prior art, Eventia Analyzer™ is manually configured to read outputs from third party products as data in the form of Syslogs, SNMP traps, Windows Events, World Wide Web Consortium (WWWC) and Netflows. In the prior art, the most common input of Eventia Analyzer™ is in the form of regular expressions composed by the system administrator, integrator, or other value provider to match logs of the third party product. The data from the syslogs are then parsed using the regular expressions and the data is put into data fields of a Check Point log which may be viewed using a Check Point product SmartView Tracker™.
There is thus a need for, and it would be highly advantageous to have a method for determining parsing rules for a large number of logs by reducing a set of log samples to a small set of unique patterns of the log samples, a method which does not require the system administrator to understand all the logs output from all the devices of the network and to compose regular expression which match all the logs.
US patent application publication 20070198565 discloses a user interface (UI)) by which a user can design a regular expression. The graphical interactive mechanism enables the user to develop regular expressions without an understanding of the intricacies of the regular expression syntax. The UI can provide an interactive mechanism by which a user can graphically annotate (e.g., color, highlight) a regular expression thus, mapping the expression to a particular tabulated output. The UI can provide a particular kind of dialog layout with several controls and dynamically linked views, e.g., a data view, a regular expression view and a column view which can facilitate definition of the regular expression as well as creation of mappings to output columns (e.g., annotations). U.S. patent application publication 20070198565 is included herein by reference for all purposes as if entirely set forth herein.