Enterprise database systems store vast amounts of data received from one or more different sources. This data is typically stored in records of relational database tables. As such, the data may be retrieved, modified, supplemented and otherwise manipulated using a relational query language such as Structured Query Language (SQL).
One common operation is to join two or more tables based on data which is common to the tables to be joined. FIGS. 1A and 1B illustrate tables 10 and 12, respectively, for the purpose of explaining a join operation. More specifically, it may be desired to join tables 10 and 12 based on the column “DUNS” in order to generate a table including data from both tables. Since every DUNS value of table 12 also exists in table 10, all records of table 12 are enriched by the join as shown in table 14 of FIG. 1C.
FIG. 2A illustrates table 20, which is identical to table 12 but does not include the column DUNS. In such a case, the customer data is the join key and the join occurs on the data of column Name of table 10 and the data of column Customer of table 20. Due to differing punctuation, abbreviations and capitalizations, only the “McDonalds” value of the Customer column of table 20 matches an identical value of the Name column of table 10. As a result, the generated table 25 of FIG. 2B only enriches the “McDonalds” customer record. The other records of table 25 include no values in the Status and Subsid columns.
Even if the data were stored in tables according to standardized punctuation, capitalization and abbreviations, differing table schemas may prevent conventional joins. For example, one table may store a person's name in a single column (e.g., Name) while another table may store name data in multiple columns (e.g., First, Middle, Last). In another example, one table may store a person's address in two columns (e.g., Address and CityStateZip) while another table may store address data in seven columns (e.g., HouseNumber, Street, UnitNumber, City, Region, Postcode, Country). Efficient systems are desired to execute joins in the above-described and other scenarios.