The rapid increase in the production and collection of machine generated data has created relatively large data sets that are difficult to query. The machine data can include sequences of time stamped records that may occur in one or more usually continuous streams. Further, machine data often represents some type of activity made up of discrete events.
Searching data requires different ways to express searches. Query engines today allow users to search by the most frequently occurring terms or keywords within the data and generally have little notion of event based searching. Given the large volume and repetitive characteristics of machine data, users often need to start by narrowing the set of potential search results using event-based search mechanisms and then, through examination of the results, choose one or more keywords to add to their search parameters. Timeframes and event-base metadata like frequency, distribution, and likelihood of occurrence are especially important when searching data, but difficult to achieve with current query engine approaches.
Also, users often generate arbitrary queries to produce statistics and metrics about selected data fields that may be included in the data. Indexing may enable raw data records to be identified quickly, but operations that examine/scan the individual data records may become prohibitively expensive as the size of the data set grows. Further, the arbitrary queries generated by the user can intentionally or inadvertently overload the query systems with high levels of concurrent searches. Thus, systems that can query relatively large sets of data are the subject of considerable innovation.