The rapid increase in the production and collection of machine-generated data has created large data sets that are difficult to search and/or otherwise analyze. The machine data can include sequences of time stamped records that may occur in one or more usually continuous streams. Further, machine data often represents activity made up of discrete records or events.
Often, search engines may receive raw data from various data sources, including machine data. In some cases, search engines may be configured to transform raw data in various ways prior to storing it. At least one of the transformations may include extracting field values from the raw data. Sometimes the raw data may be unstructured; this may make it difficult for systems to efficiently analyze the data to determine what data may be included in the raw data and/or how to generate a field value extraction rule. This may be especially true where the datasets are considered extremely large, such as in the terabytes or greater. Such large unstructured datasets may make it difficult and time consuming to analyze the data so as to be able to perform various actions on the data. For example, determining extraction rules, modification rules, or the like on such large datasets that are correct and effective may be difficult and time consuming. Improper and/or ineffective rules may result in improper values from the raw data and/or omit significant values. Thus, it is with respect to these considerations and others that the present invention has been made.