Computer systems utilized for business and other systems generate messages which report user access, service errors, and other information about the operation of the systems. These messages are recorded in a log, managed by the computing system and are therefore called log messages. Traditionally log messages are recorded in files on the local file system, or in the case of Syslog enabled systems, can be directed to external storage systems. In some scenarios, computing systems based on Microsoft Windows record log messages to the local file system via the Windows Event Log.
Recent industry and government regulations such as the Payment Card Industry Data Security Standard (PCI DSS), Sarbanes-Oxley Act (SOX), Health Insurance Portability and Accountability Act (HIPAA), and the Gramm-Leach-Bliley Act (GLBA), etc. require that log data be collected, regularly reviewed, and securely archived. To meet the requirements of these regulations, log message files must be archived for up to seven (7) years. For large organizations or organizations with specialized operations, the volume of log messages generated may require storage capacity approaching petabytes (PB) of data. This has generally resulted in significant capital investment, staffing expense and operational complexity necessary to provide secure and reliable storage for the required length of time.
Complications also arise when attempts are made to review the volumes of log messages generated. Hardware and software vendors, developers, owners, etc. encode information in their log messages in varying ways. Thus, from the perspective of systems that receive these varying log messages, the messages are freeform with little, if any, formatting in common. Complicating the situation further, several types of log messages (even from the same vendor) can convey the same or similar information while varying widely in format. Because of the freeform nature of log messages, obtaining meaningful information from the data encoded in the multitude of log messages from even one computer system can require manual review of hundreds, thousands, or more disparate log messages. Manually reviewing such massive quantities of information entails correspondingly massive quantities of labor, time, and effort. Manually correlating data, manually detecting meaningful patterns, manually recognizing incidents, and the like with such massive numbers of log messages require skills, talents, and endurance not readily available to most business organizations.