Database application domains such as, but not limited to, data integration, E-business, data warehousing and semantic query processing involve schema matching. Schema matching is performed to identify similarity between schemas in source and target databases. The matches are typically of two types i.e. simple and complex schema matches, wherein the simple matches represent a 1:1 mapping between schema elements of the source and target schemas and the complex matches represent a 1:N, M:1 or M:N mapping between the corresponding schema elements.
Schema matching is generally performed manually, wherein a user such as a Subject Matter Expert (SME) identifies the similarity between schemas through a Graphical User Interface (GUI). Schema matching is also performed by automating steps of schema matching. Automation includes collection of exhaustive explicit information such as, but not limited to, domain knowledge and constraint rules. The collected information is used to narrow down the search for matches between schemas. The collected information is then used to identify complex schema matches. Schema matching also includes instance data comparison based on predefined inference rules. Instance data comparison is effective in identification of simple schema matches. In addition, contextual information is used for schema matching, wherein a set of logical conditions are used to identify complex schema matches.
With recurrent growth in database application domains considerable increase in size and complexity of schemas has been observed. The increase in size and complexity makes manual identification error prone, effort intensive and time consuming. Further, there is lack of exhaustive explicit information in case of heterogeneous and distributed systems such as, but not limited to, e-commerce systems, Business-to-Business (B2B) exchanges and online cataloguing systems, which limits application of schema matching via automation. Furthermore, instance data comparison based on predefined inference rules becomes ineffective in identifying complex schema matches. Although, complex schema matches can be identified using contextual information and instance data, the number of matches identified is limited due to the set of logical conditions and the predefined inference rules, thereby limiting the scope of schema matching.
Consequently, there is need for a system and a method for efficiently matching source and target schemas. Also, the method should enable the identification of complex schema matches without manual intervention.