Data analysis or visualization tools and systems typically allow a user to input or enter a data set by, for example, uploading a file to the system or, in some cases, by manually inputting data points or data values. The data analysis tool must parse the data set before the data can be analyzed, where parsing includes analyzing and interpreting strings of symbols in the data according to certain rules.
Common parsing systems create meaningful combinations of symbols, or tokens, from strings of symbols in the data set, check for allowable combinations of symbols and/or tokens, and detect the meaning of the allowed symbols and/or tokens. Often, the rules used to accomplish these parsing activities are data format-specific (i.e. the parsing rules vary greatly for different data formats). For example, appropriate rules for parsing a spreadsheet data set may differ greatly from the appropriate rules for parsing a scripting language file. As such, many data analysis systems are only capable of operating on a certain finite number of data set formats. For example, a spreadsheet software application, such as Microsoft Excel®, only accepts pre-defined spreadsheet data formats such as Excel Binary File Format (XLS), comma separated values (CSV), OpenDocument spreadsheet (ODS), etc. If a data set in an unknown format is input to the spreadsheet software application, the application will parse the data set incorrectly, assuming the software even allows such input.
Other parsing systems, such as those used in search engines, do not necessarily restrict parsing to certain data set formats. Rather, many search engines parse search terms having an arbitrary format and are then able to retrieve relevant information related to the search terms using reference data indexing techniques. For example, a search engine user may input “United States of America” into the search engine. The search engine then references an indexed list of previously parsed documents, websites, etc. containing the terms “United,” “States,” and “America” or combinations of those terms, and the search engine uses the index to appropriately match the phrase “United States of America” with information contained in the index. Although useful for searching reference data, parsing systems making use of referenced data indexing are limited by the amount of reference data available.