The rapid increase in the production and collection of machine generated data has created relatively large data sets that are difficult to search. The machine data can include sequences of time stamped records that may occur in one or more usually continuous streams. Further, machine data often represents some type of activity made up of discrete events.
Searching data requires different ways to express searches. Search engines today allow users to search by the most frequently occurring terms or keywords within the data and generally have little notion of event based searching. Given the large volume and typically repetitive characteristics of machine data, users often need to start by narrowing the set of potential search results using event-based search mechanisms and then, through examination of the results, choose one or more keywords to add to their search parameters. Timeframes and event-based metadata like frequency, distribution, and likelihood of occurrence are especially important when searching data, but difficult to achieve with current search engine approaches.
Also, users often generate arbitrary queries to produce statistics and metrics about selected data fields that may be included in the data. Indexing may enable raw data records to be identified quickly, but operations that examine/scan the individual data records may become prohibitively expensive as the size of the data set grows. Thus, systems that can search relatively large sets of data are the subject of considerable innovation.