Data centers can contain thousands of servers (both physical and virtual machines), with each server running one or more software applications. The servers and software applications generate log stream records to indicate their current states and operations. For example, software applications may output log records that sequentially list actions that have been performed and/or list application state information at various checkpoints or when triggered by defined events (e.g., faults) occurrences, etc.
These log records are stored and searched by systems operators for various purposes—e.g., to detect anomalies, troubleshoot problems, mine information, check the health of the servers etc. The log records can be generated on the order of millions per second in large data centers. In existing processes, the log records are stored in a full-text index (FTI). An FTI allows complex text queries to be performed on the log records.
Operators typically perform iterative full-text queries on the FTI of log records. When an operator's query returns an excessive number of content rows of log records, the operator discards the results (wasting system resources that ran the query) and reissues the query with additional restrictions. Full-text queries of the FTI of log records consume significant processing resources and time, and the burden on the associated computer system increases with the number of search iterations.