This invention relates to the field of transforming data from one format to another. In particular, the invention is directed to automated systems and methods for transforming a relational database into a hierarchical database whereby information in relational database tables is transformed into hierarchical objects.
A relational database management system (RDBMS) is used to store information in the form of tables and fields. A hierarchical database management system (HDBMS) is used to store information in the form of related data objects. Both relational and hierarchical databases contain a structure according to which data is organized, known as a schema. The differences between the schemas of the separate database management systems can present difficulties when moving data between databases of different types, particularly when moving data between relational and hierarchical databases.
Part of the problem with moving data between one system and another is the way in which they store data and maintain relationships between data. In a Relational Database, data is stored in a series of tables. Each table represents a logical grouping of information. Individual tables usually do not contain sub-groupings, so the tables may be linked together, if at all, by primary keys. A primary key can be used to link relational data that is stored in separate tables within the database but is somehow related. The primary key may be stored in each separate related table. In a Hierarchical Database, information is stored in data objects that may contain one or more sub-element objects of particular kinds of data. Thus, a method for moving data between differing database systems is needed, which has the capacity to xe2x80x9ctransformxe2x80x9d the actual data from one format to another.
The present invention solves the aforementioned problem and allows data to be transformed from a relational database to a hierarchical database. The invention creates via a map structure a union between the structure of the relational database and the hierarchical database. The present invention allows data transformation objects to be created that relate to the tables in a relational database and to the schema of the corresponding hierarchical database. The present invention allows for the manipulation of particular data values in the transformation process to improve data integrity and quality using data scrubbing algorithms that may be specified by the user or program defaults. Due to the need to move data from one database type to another, a method has been invented for successfully extracting information from a RDBMS to a HDBMS, based on the schema information from both databases. In particular, the method can transform relational database data from many database tables into a single hierarchical data object, as needed.
The present invention comprises a computer-implemented method for transforming data in a relational database to a hierarchical database. It comprises creating an import map that maps each relational database field to a hierarchical field in the hierarchical database using a relational database schema and a hierarchical database schema, using the import map to import data from the relational database; and transforming the relational data into hierarchical documents. The method further comprises creating a hierarchical database schema that corresponds to the relational database schema. The hierarchical documents may be stored in computer memory or on disk.
Creating a hierarchical document schema comprises the following steps: step 1: determining a relationship between a first table in the relational database and a second table in the relational database using a primary key; step 2: forming a compound object in a hierarchical document that is associated with the first table; step 3: forming an object selected from the group consisting of a compound object and a simple object that is associated with the second table; and step 4: repeating steps 1 through 3 until all tables within the relational database are associated with an object in the hierarchical document.
The method of creating the import map and transforming the relational data comprises creating a hierarchical database schema, comprising at least one compound and at least one simple object. For each compound object, an SQL statement is defined, which expresses a 1 to n relationship of the compound object to its parent object and which expresses source fields available for child objects of the compound object. Each simple object is related to at least one source field name in its parent compound object. The source field names are extracted from the relational database for all compound objects using the SQL statement. The simple object accesses the source field names of its parent compound object to determine the source field names the simple object can map to. The import map is traversed and each time an SQL statement is encountered, the SQL statement is executed which results in a dataset. Each dataset is traversed and the relational data for each simple object based on the dataset is found. The creating a hierarchical database schema comprising at least one compound and at least one simple object is based on a relational database key structure where the relational database key structure has at least one primary key and at least one foreign key. A data scrubbing algorithm may be applied to the relational data and storing the scrubbed data in the hierarchical document.
The present invention comprises computer-readable media having computer-executable instructions for performing the methods as above.
The present invention comprises computer-readable media in the form of a data structure containing a collection of hierarchical objects in an import map. The import map comprises at least one compound object. The compound object comprises an indicator of a relational database and a field name of relational data associated with the compound object, a structured query language (SQL) statement, and a source field name extracted from the relational database using the SQL statement. The simple object comprises a field name of relational data associated with the simple object and the relational data extracted from the relational database and stored in the simple object.