Embodiments described herein relate generally to information or data analysis, discovery, classification and retrieval, and more particularly to methods and apparatus for implementing data harmonization by concept-based analysis of structured data and/or unstructured data stored in multiple databases for relating previously unrelated data.
Organizations often utilize sophisticated computer systems and a multitude of databases spread across multiple physical locations to inform and automate portions of the decision-making process. Many such systems and databases organize relevant data into a structured format, making it accessible by a broad array of query, analysis, and reporting applications. Additionally, often much of the information relevant to these calculations is stored in a variety of unstructured formats—such as handwritten notes, word processor documents, e-mails, saved web pages, printed forms, photographic prints, and/or the like.
Structured data generally refers to data existing in an organized form, such as a relational database, that can be accessed and analyzed by conventional techniques (i.e. Standard Query Language, SQL). By contrast, unstructured data can refer to data in a textual format (e.g., handwritten notes, word processor documents, e-mails, saved web pages, printed forms, photographic prints, or a collection of these formats) that do not necessarily share a common organization. Unstructured information often remains hidden and un-leveraged by an organization primarily because it is hard to access the right information at the right time or to integrate, analyze, or compare multiple items of information as a result of their unstructured nature. Concept-based analysis can relate disparate unstructured information so that structuring of data can be avoided all together. It should be noted that structuring previously unstructured data from, for example, naturally occurring human friendly text is very information technology (IT) intensive and complex and typically loses original meaning and context. Concept-based analysis can provide ways for users to relate data directly so that complex technical conversions and complex programming languages such as, for example, Structured Query Language (SQL) can be avoided. The user can directly find value in unstructured data without the need for conventional tools (such as, for example, SQL, or other information query and/or analysis tools) and can analyze unstructured data for hidden trends and patterns across a corpus of unstructured data. In many instances, data (structured data and/or unstructured data) associated with an event or a task can be stored across multiple databases that are logically separate.
Hence, a need exists for a system and method for implementing data harmonization that can programmatically organize, analyze and relate structured data and/or unstructured data that are stored in multiple (separate) databases. A further need exists for a system and method for concept-based classifying, gathering, categorizing, and analyzing of structured data and/or unstructured data stored in multiple databases for tracking trends and exceptions that can be used to make determinations based on the data.