One of the most difficult and complex tasks in a data processing environment involves the data integration process of accurately matching, linking, and/or clustering records from multiple data sources that refer to a person, a business, a hierarchical structure or other entity. The task of data integration often becomes more difficult as the amount of collected data grows. This issue, also known as the “Big Data” problem tends to limit the capability of organizations to process and use their data effectively and it makes the record linkage process even more challenging.
Certain forms of data can be used to represent a hierarchy. A hierarchy is a general term that can be used to describe an arrangement of entities at various levels within a given structure. A hierarchy may be utilized to describe many types of phenomena, organizations, structures, processes, etc. For example, a business may be represented by an organization chart in which the various levels of the business may be defined by functions, seniority, locations, direct reports, etc. A chief executive officer, for example, may report to a board of directors at the top of a hierarchy, and managers may report to the chief executive officer, and so forth. Thus, for a given level, there may be related entities above, below, or at the same level. Entities in the hierarchy may be linked vertically and/or horizontally. Certain links between the entities may be direct, indirect, or non-existent.
In hierarchical structures, it is often the relationships and connections between the various entities in a hierarchy that allow one to understand the structure and make determinations about how a particular entity fits into the structure. For example, critical information may be missing with regard to an entire branch of a hierarchy if a single parent/child relationship in the hierarchy is missing or unknown.
The data in a hierarchy can be organized according to various structures. For example, a simple tree structure may include parent/child relationships in which each parent can have many children but each child only has one parent. More complex structures may exist within certain hierarchy structures that allow parents to have multiple children, and children to have connections with multiple parents. Even more complex structures may allow for direct or indirect connections between entities on the same or different levels. Yet other data structures may exist where it is desired to determine relationships among the data where no implicit hierarchy structure exists within the data.