Demand for data continues to change dramatically including the demand for data access and management as well as the efficient management of different data types of entities across various domains. International Business Machines (IBM) has introduced a number of products to meet these demands, IBM InfoSphere Master Data Management (MDM) Server for Product Information Management (PIM) and the IBM InfoSphere MDM Server, for customer data integration (CDI), are two such products. Each of these data types, PIM and CDI, also set forth additional challenges in regard to persistence and manipulative aspects.
For example, CDI data types may often be data having canonical representations whose attributes are either flat or represent aggregation “has a” relations to other entities whose existence is not dependent on their parents. By further example, PIM data may have complex attributes being hierarchically-oriented, sparsely populated, multi-value, and represent composition “has a” relations to entities that do not exist outside the context of their parent entities; typically PIM schema (i.e., instance schema) is highly variable, known only at runtime, and is dynamic as it may change over its lifetime.
In practice, it appears that each data type has typically been generally better suited for a particular environment. For instance, for data of a CDI type, a prescriptive type of model is traditionally well-suited, as typically fixed relations are set forth (i.e., named columns, named fields, etc.) which over time remain suited for the implementation of CDI types of data. In this manner, data extensions may be handled in a variety of ways such as providing for adding new tables or columns, or the addition of attributes using a vertical representation supporting chunks of 10 extension attributes per database, for example. By further example, for data of a PIM type, a flexible model that does not provide named columns or fields but supports attributes which may be defined throughout the lifetime of the data and which enables product categories to evolve and change as needed, is traditionally better suited.
However, the persistence and manipulation of these types of data, more so of the PIM type, as well as other data types, is becoming more complex, particularly at the service layer (i.e., highly structured programming languages such as java) and in systems where the data models are unable to handle the complexity (i.e., where volume of information per instance such as amount and variety of information per instance). Further, the marketplace is seeking integration of PIM-data type functionality with CDI-data type techniques in the near future.
Managing entities that have typically prescriptive bounds is of interest in today's marketplace. In this manner, the management of common attributes (i.e., attributes that are common to all instances of a given entity type or the attributes that are completely determined by the entity's type) is sought. Common attributes are typically made up of platform attributes and, optionally, deployment attributes to supplement the out-of-the-box platform attributes in system offerings. Common attributes may also be further segmented and categorized in certain situations.
FIG. 1A sets forth an exemplary depiction of an entity type 198 that supports only common attributes 197 representing a basic system level entity 199. Depicted in FIG. 1A are platform attributes 196 and deployment attributes 195, each with its respective type definition (i.e., Spec), 194 and 193. As used herein, the term “Spec” is intended to mean definition of platform and extension attributes while are predefined or known in advance, for example. An example of this is the contact or party entity in which may be provided in a predefined manner with a fixed set of attributes (i.e. platform attributes), to which extension or deployment attributes may be added thereto to enhance the entity.
However, managing entities that typically have less prescriptive bounds to the set of data they capture, i.e., facet attributes (attributes that apply to a subset of the instances of a given entity type), in the relational database and the supporting application code presents technical challenges.
For instance, at the service layer, where service calls (i.e., service requests) may be made without knowledge of the type of data to be encountered, conventional approaches prove insufficient. By example, in conventional approaches, data may be encountered having attributes that were known and often not known at the time of the data model build, resulting in faceted attributes. Faceted attributes, as used herein, are understood to be properties or characteristics of the attributes of the objects being represented in which, depending on the context, the properties or characteristics may be relevant or not. Faceted attributes are further understood as being attributes whose applicability varies between any two instances of a given entity type, where any given instance could have its own unique set of facet attributes, although, however, any given set of facet attributes often applies to more than one entity. Conventionally then, service requests of data where the representation of data is dynamically changing in an environment that is structured is problematic for systems today as the structure first needs to be created for the service call. Similarly, conventional approaches also typically require the service layer to have knowledge of the facet attributes, whether relevant or not, since all facet information must typically be read and written whenever any of the attributes are accessed in these approaches. The persistence and manipulation of such data in such systems is also a challenge.
Unfortunately conventional approaches typically require cumbersome extensions which result in the addition of many database structures and additional application code (object representation) to execute business logic against the new data. These approaches also presents a decrease in performance particularly when the data is sparsely populated, as the in-memory object representation can become extremely large and all possible attribute definitions and values, despite their use in a particular entity instance always exist and have a minimum value (null) stored. Additionally, these approaches are limited as they require that the objects are coupled to the underlying database structures that have been defined to store the attributes, and are not easily extensible, as the facet attributes of the application change. Another traditional attempt to overcome these issues includes providing extensions in the data model to account for new structure resulting in extension “blow outs,” particularly in heavily-structured environments as such data models will extract all data attributes when only a select few fields may be needed for the query.
These conventional approaches attempt to account for the structure by using proprietary object representation that attempt to handle all attributes in a dynamic fashion, where typically nothing is prescribed per entity type. The resolution of all attributes is done in a dynamic fashion, where even common attributes must first be resolved per entity instance of a particular entity. Additionally, the extensions resulting in these approaches require in-depth knowledge of a non-standard proprietary object representation to support the introduction of new attributes. Further all facet information stored in these approaches must be read and written whenever any of the attributes are accessed, and specialized data structures are required to read and manipulate the data structure.
Each of these approaches is problematic and does not overcome the issues set forth and fails to address the needs of demands for systems capable of overarching data management, regardless of type. Additionally, these traditional approaches can result in decreased performance of the systems as unnecessary processing effort is spent on querying, extension assignments and data extraction. Further each of the approaches is further limited as each requires that the application access to the objects be tightly coupled with the representation, resulting in the fact that the choice of the storage for facet information cannot be changed easily.
As a result of these limitations and the implementation of traditional (i.e., conventional) approaches, database structure housing the prescribed and dynamically defined attributes with an application is typically coupled. Therefore, it becomes difficult if not impossible to replace a database structure without affecting processing of the prescribed and dynamically defined attributes.
Since most applications now prefer to use open systems and new capabilities are also desired by users, such as XML, accordingly, there exists a need for a solution providing facet attributes in an application object model with independent application storage providing for the substitution of various storage representations that best suit a given access pattern with minimal impact to the application model. Additionally, there exists a need for a solution for decoupling an application and persistence representations for facet attributes in a data management system. The present invention addresses such needs.