Computer systems are not necessarily immune to problems or troubles. These problems or troubles are caused by various reasons including, for example, troubles with hardware, troubles with local networks, troubles with the Internet, software bugs, and data corruption.
To enable analysis of causes of the problems or troubles that have occurred, for example, measures to generate log messages (for example, system logs, operating system logs, or application logs) at various levels of an operating system, middleware, or an application program are taken.
In general, log messages have the following properties:                in accordance with a format defined in advance inside software, or the like, a message to be output is included;        one message is a sequence formed of signs including characters;        a message is not necessarily readable by human beings, but needs to be decomposable into meaningful particles; and        a readable character string is divided by predetermined characters, such as blank characters (may be one-byte or two-byte spaces) or signs (for example, special signs).        
When a trouble occurs in the system, a large number of log messages having the above properties are generated. In such a case, to understand the situation from the log messages and to solve a problem promptly, it is necessary to rapidly identify a cause.
However, although log messages are mechanically generated, most of them are not structured data. Therefore, a large amount of manpower and high cost are required in advance to make the log messages mechanically handled.
Human-readability is taken into consideration for log messages. As a technique for recognizing a meaning from a generated character string, a natural language analytical approach, such as text mining, has been known. Therefore, the natural language analytical approach is applied to log messages. However, log messages do not necessarily conform to a natural language sentence structure, and have a particular tendency that the length of one sentence is shorter than normal sentences. Therefore, it is necessary to apply an approach different from the above natural language analytical approach, instead of simply applying the natural language analytical approach, to log messages.
Furthermore, it is said that more than half of operations performed by data scientists are data integration, data cleansing, and data conversion.
Patent Literature 1 (Japanese Patent Application Publication No. 2005-266919) describes features that a log message is, as illustrated in FIG. 4, a record of the use status of a system and data communication and includes the dates and times at which an operation and data transmission/reception were performed, the contents of the performed operation, the contents of the transmitted/received data, and the like, log messages are often difficult for users to decipher, and it is often difficult to determine a message generation condition and future measures to be taken (paragraph 0039), a feature that it may be difficult to appropriately find an important message since a system log or the like with a long operation time may have thousands of rows (paragraph 0040), and features that the contents of logs can be easily determined by using different colors in such a manner that messages that may be ignored are represented in light blue and important messages, such as the shortage of a log recording area (file system full), temperature abnormality, and an SCSI error, are represented in orange, a system log and the like are displayed in an html format, and measures therefor are hyper-linked (paragraph 0041).
Patent Literature 2 (Japanese Patent Application Publication No. 10-293704) describes a feature that normalized log data in which values of data items defined by extracting a value corresponding to a predefined data item from log data in a log file to be monitored are arranged is created and accumulated (claim 1).