The present invention relates to data processing, and more particularly, relates to a method and system for automatically generating semantic mapping for a relational database.
Data integration has always been one of the more important tasks in enterprise data management, and the most prevalent one is relational data integration. Classical relational data integration needs to design a global data schema, such that the relational database (RDB) of each locality can be mapped to the global data schema. However, such global data schema can hardly be designed in advance in most cases, particularly when the relational database of each locality has a dynamic addition or deletion. Therefore, the data management community gradually favors the schema-less data integration method, wherein the linked data is greatly valued. The linked data adopts a data model, namely RDF (Resource Description Framework), to represent a data entity using URI (Uniform Resource Identifier), so as to publish instance data and ontology data. Thus, the published data could be obtained through HTTP (HyperText Transfer Protocol), with linkage and contextual information to facilitate human-machine understanding.
The relational data could be published as the linked data through semantic mappings. There are well-known semantic mapping tools such as D2RQ (http://www4.wiwiss.fu-berlin.de/bizer/d2rq/), SquirrelRDF (http://jena.sourceforge.net/SquirrelRDF), and OpenLink Virtuoso (http://virtuoso.openlinksw.com/).
Relational data has a schema, while the schema is composed of tables which are composed of columns. Similarly, the linked data comprises an ontology, while the ontology is composed of classes and properties. Generally, there are two ways to publish relational data as linked data. One is by default. That is, the ontology of such generated linked data is composed of a newly defined class name (i.e., table name in the relational database) and a property name (i.e., the column name in the relational database). The other way is to define a semantic mapping, such that the table names and column names in the relational database can be mapped to the class names and property names that have been defined in the ontology of the linked data. Not surprisingly, the linked data as published by default are always trivial and meaningless. On the contrary, the relational data as published by a well-defined semantic mapping are more useful and meaningful. However, when hundreds of, even thousands of relational databases need to be published as linked data, it is undoubtedly time-consuming and energy-consuming to manually define the D2RQ mappings.