The rapid increase in the production and collection of machine-generated data has created large data sets that are difficult to search and/or otherwise analyze. The machine data can include sequences of records that may occur in one or more usually continuous streams. Further, machine data often represents activity made up of discrete events.
Often, search engines may receive data from various data sources, including machine data. In some cases, search engines may be configured to transform the received data in various ways prior to storing it. At least one of the transformations may include extracting field values from the received data. Sometimes the received data may be unstructured, which may make it difficult for systems to efficiently analyze the received data to determine what data may be of interest and/or how to generate a field value extraction rule. This may be especially true where the datasets are considered extremely large, such as terabytes or greater. Such large datasets may make it difficult and time consuming to analyze the data so as to be able to perform various actions on the data. For example, determining extraction rules, modification rules, or the like on such large datasets that are correct and effective may be difficult and time consuming. Improper and/or ineffective rules may result in improper values from the received data and/or omit significant values. Thus, it is with respect to these considerations and others that the present invention has been made.