One of the most difficult and complex tasks in a data processing environment involves the data integration process of accurately matching, linking, and/or clustering records from multiple data sources that refer to a person, a business, a hierarchical structure or other entity.
Certain forms of data may be used to represent a hierarchy. A hierarchy is a general term that can be used to describe an arrangement of entities at various levels within a given structure. A hierarchy may be utilized to describe many types of phenomena, organizations, structures, processes, etc. For example, a business may be represented by an organization chart in which the various levels of the business may be defined by functions, seniority, locations, direct reports, etc.
External linking, which is sometimes referred to as “entity resolution,” may be utilized for resolving entities within a hierarchical structure. External linking may involve a process of linking information from an external file to a previously linked base file (or authority file) in order to assign entity identifiers to the external data. In certain embodiments, an external linking process may act upon a file created by an internal linking process. For example, and according to certain example implementations, an internal linking process may be utilized as initial process to characterize or group data when data relationships are not known beforehand. In an example implementation, an external linking process may be utilized after at least some data relationships are established by the internal linking process.