The present invention embodiments relate to information systems, and more specifically, to supplementing structured information about entities within a master data management (MDM) system based on references to those entities within unstructured data.
Master data management (MDM) systems integrate data from multiple structured data sources, and build a consolidated view of business entities, such as customers and products. A primary function of a MDM system is to identify multiple records that refer to the same real-world entity. This process is referred to as entity resolution, and determines that two records refer to the same entity despite the fact that the two records may not match perfectly. For example, two records that refer to the same person or entity may contain a slightly different spelling for the person's name.
Current master data management (MDM) systems are not prepared to integrate information from unstructured data sources, such as news reports, e-mails, call-center transcripts, and chat logs. However, those unstructured data sources may contain valuable information about the same entities known to the MDM system from the structured data sources. Integrating information from unstructured data sources into a MDM system is challenging since textual references to existing MDM entities are often incomplete and imprecise, and the additional entity information extracted from text should not impact the trustworthiness of MDM data.