The present invention relates generally to the field of database management systems and more particularly to formatting data from uncertain sources to join with modeled data.
Database systems rely on various models to define and structure data. A common database model is Structured Query Language or “SQL.” SQL processes different types of data using standard methods, for example, Extract Transform Loads or “ETL.” In ETL systems, data is extracted from homogeneous or heterogeneous data sources, transformed for storing in the proper format or structure, for querying and analysis purposes, and loaded into the final target. Usually all the three phases execute in parallel due to delays that may be caused by the data extraction. In ETL, while the data is being pulled, another transformation process can execute, process, and prepare received data for loading. When data is loaded into a target the data initiates with or without the completion of the previous phases.
ETL systems commonly integrate data from multiple applications, typically developed and supported by different vendors or hosted on separate computer devices. The disparate systems containing the original data are frequently managed and operated by different users.
Modeling or other annotation of data sources are also used. A common operation for data manipulation is a JOIN operation or “JOIN.” A JOIN is a command used in database queries to combine records from two or more tables within a database or received data and data stored in the database. A new combined table is generated for further analysis or stored in memory. A JOIN is a means for combining fields from two tables by using values common to each. For example, the following tables represents employee's names and their department identification, along with department identifications and department names.
Employee tableLast NameDepartment IDRafferty31Jones33Heisenberg33Robinson34Smith34
Department tableDepartment IDDepartment Name31Sales33Engineering34Clerical35MarketingPerforming a JOIN on the table above will create a table that combines fields from the two source tables by using values common to each, which can be represented by the table below:
EmployeeEmployeeDepartmentDepartment(Last Name)(Department ID)(Department Name)(Department ID)Robinson34Clerical34Jones33Engineering33Smith34Clerical34Heisenberg33Engineering33Rafferty31Sales31