Data centers can contain thousands of servers (both physical and virtual machines), with each server running one or more software applications. The servers and software applications generate log stream records to indicate their current states and operations. For example, software applications may output log records that sequentially list actions that have been performed and/or list application state information at various checkpoints or when triggered by defined events (e.g., faults) occurrences, etc.
These log records are stored and searched by systems operators for various purposes—e.g., to detect anomalies, troubleshoot problems, mine information, check the health of the servers etc. The log records can be generated on the order of millions per second in large data centers.
In existing processes, the log records are stored in a full-text index (FTI). An FTI allows complex text queries to be performed on the log records. Operators typically perform iterative full-text queries on the FTI of log records. The storage requirements of an FTI are proportional to the number of terms in the log records. The log records can be generated on the order of millions per second for large data centers. At these rates, storing the log records efficiently (both in terms of space and time), while also allowing for efficient searches, can be a significant challenge.