The term data integration refers to the problem of combining data residing in heterogeneous sources in order to provide a unified view of the data. Currently, it relates to wide range of technologies, from extract, transform and load (ETL) to enterprise application integration (EAI) to enterprise information integration (EII) and various change propagation technologies. There has been extensive theoretical research on data integration systems, exploring various mapping systems and languages, and their complexity results and limitations.
Several commercial integration systems which exist under the brand name of “enterprise information integration” usually support GAV (global-as-view) mappings. But most of these are two layered relational systems, where a global relational schema is mapped to data source specific schemas. However, in a real-life enterprise system, it is common for data to exist in a hierarchy of contexts—for instance at the enterprise level, division level, department level, function level and so on, each with its own local context and context specific assumptions. A two layered relational system is not adequate to model this complexity. Therefore the need of the art is to have richer conceptual models that allow to model data entities at multiple levels of abstraction and capture the relationships that exist between them. It is also required to have a means to map these context specific conceptual models in a hierarchical manner, for instance from function to department, from department to division, and from division to the enterprise level. Further, in the present scenario aggregated view of the data is achieved by creating, storing and maintaining data in warehouses at each of the intermediate levels within a hierarchical system. This involves large amount of time, efforts and computational resources at each of these enterprise levels. Moreover, consistency of the data to be maintained at each of these enterprise levels by way of data synchronization is an added responsibility and burden for the existing systems. For instance, U.S. Pat. No. 7,367,018 discloses a Computer method and apparatus for managing process and plant engineering data for chemical or other engineering processes across applications. The method and apparatus include a respective class view for each of multiple software applications, a composite class view, a conceptual data model and a resulting consolidated multi-tier data model. The multi-tier data model enables sharing of engineering and other data from the multiple software applications with other process and plant engineering applications and programs. An amalgamator synthesizes the class views, composite views and conceptual data model into the multi-tier data model. In forming the multi-tier data model, there is a one-to-one mapping between an attribute in the class view and composite class view, and a one-to-one mapping between an attribute in the composite class view and a data path in the conceptual data model to corresponding software applications from which the attribute originated.
The above prior art uses one to one mapping approach between the attributes to provide an aggregated view. However there is no mention of technique adopted to resolve the issues arising in establishing mapping rules that are required to solve complex views associated with data existing in hierarchical structure.
At another instance, U.S. Pat. No. 7,596,559 provides a system and method for data integration by querying multiple extensible markup language (XML) source schemes through a common XML target schema and a query rewriter adapted to reformulate the target query in terms of the source schemes based on the mappings, and to integrate the data based on the set of constraints. The query rewriter is adapted to rewrite the target query into a set of source queries comprising the source schemes. A processor evaluates a union of the set of source queries. The prior art is however suitable only for mapping between the data models across a single layer and is not suited for mapping between complex data models across an hierarchical level.
Another US Patent application 20080243765 discloses a method for generating nested mapping specifications and transformation queries based thereon. Basic mappings are generated based on source and target schemes and correspondences between elements of the schemes. A directed acyclic graph (DAG) is constructed whose edges represent ways in which each basic mapping is nestable under any of the other basic mappings. Root mappings of the DAG are identified. Trees of mappings are automatically extracted from the DAG, where each tree of mappings is rooted at a root mapping and expresses a nested mapping specification. This invention is only a semi automated discovery for finding a correspondence between different schemas which requires a manual input to be finally refined into mappings. Further, the invention describes an automated means to discover mappings between source and target schemas, and a means to use these mappings for query transformation, the method is limited to a two layered mapping system and the complexities associated with transforming queries across multiple hierarchical layers of models and mappings is not addressed.
Therefore, the existing solutions generally do not provide support for modeling data within real time enterprise application where the complexities of modeling are associated with data existing in a hierarchy. For a dynamic environment views are often complex and the efforts required in their design are considerable. Hence, due to the drawbacks of the conventional approaches there remains a need for a novel system that can provide unified view along with convenience during data integration by providing a unique modeling and query rewriting approach.