In large data sets with multiple tables of information, an operation called a “join” is commonly performed to generate reports in response to queries.
For example, a table of data about people may include an entry (or row) for each person (such as each employee of a company). Each entry includes data in fields (or columns), where that data represents the person. For example, the table can have an identifier field (or column) for each entry, which stores a value which should be unique for each person. Similarly, a table of data about locations may include an entry (or row) for each location (such as each office for a company). Each entry includes data in fields (or columns), where that data represents the location, such as address data. The table also can have an identifier field (or column) for each entry which stores a value which should be unique for each location. Another table may include associations between people and locations. Each entry in this table provides at least the identifier of the person and the identifier of the location to which that person is assigned.
Without joining tables, generating a report listing employees and their addresses would involve accessing each of these three tables to obtain all of the information for the report. Joining involves combining the data from among the tables into another data set that can be processed as a combined table. For example, a possible result of joining the three tables above would be a single table with an entry for each person, including their names, identifiers, office identifiers and office addresses. How the tables are combined can be described as an inner join or outer (left or right) join.
Joining database tables is generally easy if the database tables are all designed by the same person or team of people, and the designs are coordinated. Joining tables also is generally easy if different tables still have the same field names and data types for fields that store the same data, such as the identifiers in the example above. Joining tables becomes more complex when the tables arise from separate and distinct databases with different table structures without any design coordination, often called “silos”. Joining tables also is more complex if the data sets are arbitrary and generated from unstructured data.