Data centers can contain thousands of servers (both physical and virtual machines), with each server running one or more software applications. The servers and software applications generate streams of log records to indicate their operational states and progression. For example, software applications may output log records that sequentially list actions that have been performed and/or list application state information at various checkpoints or when triggered by defined events (e.g., faults) occurrence, etc.
These log records are stored and searched by systems operators for various purposes—e.g., to detect anomalies, troubleshoot problems, mine information, check the health of the servers, etc.
In existing systems, the log records are stored in a log record repository, which may be a full-text index (FTI). An FTI allows complex text queries to be performed on the log records. The storage requirements of an FTI are proportional to the amount of content in each of the log records. The log records can be generated on the order of millions per second for large data centers. At these rates, storing the log records efficiently (both in terms of space and time), while also allowing for efficient searches, can be a significant challenge.