The field of the disclosure relates generally to data analysis, and more specifically, to processing unstructured data and/or partially structured data to generate structured data for processing by an application. As used herein, unstructured data refers to data free-form and variable based upon the syntax/language of the person that generated the data.
In data analysis systems, data, such as unstructured text and/or partially structured text or other data types, for example, alphanumeric strings and non-alphanumeric data (images, metadata and the like) often needs to be processed and/or organized into a more structured form before being added into the system. However, it may be difficult and time consuming to identify, parse, and extract relevant information from the unstructured text and/or partially structured data. Using generic parsers and/or extractors to identify this information, data may be ignored, misidentified, and/or inappropriately deconstructed. To correct these errors, application-specific code is often written to properly identify the information. However, writing and implementing this specialized code may be time consuming, and the resulting code may only be applicable to a particular situation. Further, periodically updating the source of the unstructured text and/or partially structured data exacerbates these issues, as it introduces new situations that may require further specialized code. Further, the specialized code can generally be written and updated only by experienced personnel.
Natural language methods may also be implemented to process and/or organize the unstructured data and/or partially structured data. However, depending on the source of the unstructured data and/or partially structured data, natural language may not be effective in organizing the unstructured data and/or partially structured data. Further natural language methods may require an ontology expert and a data mining expert for proper programming and updating. Finally, artificial intelligence tools such as rule based systems, neural networks, and/or Bayesian networks may be used to process and/or organize the unstructured data and/or partially structured data. However these systems also require experienced personnel for implementation and/or updating.