The main objective of virtually every business organization is to be profitable. In knowledge based economy, one of the major success factors is competent information management. Enormous amounts of data are created daily, and the ability to efficiently work with information is key for a company to propel and to emerge to a position of strength. One of the aspects in information management concerns process execution and the decrease of operational costs. Usually, without elaborated information management strategy, the quality of the generated data in a company decreases. Hence, the adopted business processes and initiatives are negatively affected. Incorrect or out-of-date information about customers, partners, and products result in time loss, discredited credibility with customers, frustration in supply chain, etc. Another aspect aims at ensuring trustfulness in the data generated and stored across the landscape of a business organization. The confidence in the available information enables the stakeholders in an organization to work efficiently, and accurately. Generally, there are many sources of information which create data redundancy and duplications, e.g., daily data entries by different stakeholders, data migrations, legacy systems data, data acquired as a result of mergers and acquisitions, etc. Therefore, the businesses need to follow stringent rules for data consolidation and data cleaning.
There are various software products and tools available for data management that are developed to help understanding the complex and multidimensional relationships in the enterprise data. Such products provide efficient handling of customer and business data elements among different applications, including business intelligence (BI), enterprise resource planning (ERP) systems, middleware applications, etc. One of the most important functions of data management is the ability to detect, match and consolidate duplicate data, leveraging multiple data sources for analytical or operational needs. Therefore, the availability of efficient data matching algorithms is essential for a quality information management. However, in many cases the data management products fail to identify data redundancy or inconsistency. For example, when comparing string data elements, the actual characters of the data elements are simply matched. The data elements may contain similar, even duplicate information, presented or described with different sets of characters. Thus, such similar data elements remain undetected.