Modern data centers and other computing environments can comprise anywhere from a few host computer systems to thousands of systems configured to process data, service requests from remote clients, and perform numerous other computational tasks. During operation, various components within these computing environments often generate significant volumes of machine-generated data (“machine data”). In general, machine data can include performance data, diagnostic information and/or any of various other types of data indicative of performance or operation of equipment in a computing system. Such data can be analyzed to diagnose equipment performance problems, monitor user interactions, and to derive other insights.
A number of tools are available to analyze machine-generated data. In order to reduce the volume of the potentially vast amount of machine data that may be generated, many of these tools typically pre-process the data based on anticipated data-analysis needs. For example, pre-specified data items may be extracted from the machine data and stored in a database to facilitate efficient retrieval and analysis of those data items at search time. However, the rest of the machine data typically is not saved and is discarded during pre-processing. As storage capacity becomes progressively cheaper and more plentiful, there are fewer incentives to discard these portions of machine data and many reasons to retain more of the data.
This plentiful storage capacity is presently making it feasible to store massive quantities of minimally processed machine data for later retrieval and analysis. In general, storing minimally processed machine data and performing analysis operations at search time can provide greater flexibility because it enables an analyst to search all of the machine data, instead of searching only a pre-specified set of data items. This may, for example, enable an analyst to investigate different aspects of the machine data that previously were unavailable for analysis. However, analyzing and searching massive quantities of machine data presents a number of challenges.