This disclosure relates to systems and techniques for the creation and use of a thesaurus for identifiers of complex data assemblies.
A data assembly is a collection of associated information that is treated as an entity in data processing activities. Data assemblies include complex data structures, as well as abstractions such as data objects that can include information drawn from one or more complex data structures. Hereinafter, for the sake of convenience, the term “complex data structures” includes such abstractions even though the abstractions themselves need not be a single complex data structure.
Complex data structure types, also referred to as “composite data types” and “data types,” are assemblies of simple data types. Simple data types, also referred to as “primitive” and “elementary” data types, cannot be broken down into smaller component data types. In general, simple data types are the basic data types that are predefined in a language for authoring machine-readable instructions. Simple data types include, e.g., character, numeric, string, and Boolean data types. Simple types do not have element content and do not carry attributes.
In addition to simple data types, complex data structure types can also include other complex data structures in the assembly. In general, complex data structure types are defined by a user to fit the operational context of a particular set of machine-readable instructions. Example complex data structure types include data objects, records, arrays, tables, and the like. Complex data structure types can be defined by a user who assembles a set of elements, fields, and/or attributes to form a reusable data structure. Each of these has a type and, as discussed above, hierarchical and recursive complex data structure types that are themselves assembled from complex data structure types can be formed.
A data structure identifier, or a “key,” is information that identifies a complex data structure for data processing activities performed in accordance with a set of machine-readable instructions. The identification is generally unambiguous, i.e., each identifier or key generally refers to a single complex data structure to the exclusion of all other data structures.
A data structure identifier can include, e.g., a name or a value that identifies the object within an identification scheme, a scheme identifier that identifies a frame of reference in which it is possible to identify a data structure, and an agency identifier that identifies the entity that defines the identification scheme and issues names for data structures within the identification scheme.
Different applications, different modules, different data processing systems, different data processing system landscapes, and different public identification scheme entities (such as Dun & Bradstreet, which issues DUNS numbers, and GS1, which issues GTIN's) can use different identification schemes, in which even the same single data structure is referred to using different identifiers.
Moreover, even a single application, module, data processing system, data processing system landscape, and/or public identification scheme entity can use multiple complex data structures of the same semantic type to refer to the same real-world item. Semantic type is a descriptive attribute of information that identifies the behavior (i.e., the semantics) for that information. The semantic type of information can identify the usage and rules for that information to set of a data processing instructions. Two or more objects (or other complex data structures) of the same semantic type can be used to refer to the same single real world entity in one or more sets of data processing activities. For example, a data processing module can include a “product object” instance that includes attributes and values that characterize an instance of a real-world item as a product. The same data processing module can include a “material object” that has the same attributes and values and characterizes the same real-world item, but as a material. Moreover, a second data processing module can include a “design object” that has the same attributes and values and characterizes the same real-world item, but as a design. Even though such objects may refer to the same single real-world entity and share the same semantic type, the various objects may be referred to using different identifiers.
When information regarding a data structure or structures is exchanged, a process called key mapping can be used to translate the different identifiers. In general, key mapping involves accessing a key mapping database where keys used by a first set of processing activities are associated with keys used by a second set of processing activities. When information regarding one or more complex data structures is exchanged, one of the sets of processing activities can access the key mapping database to translate the key from the source processing activities to the key in the second processing activities.