Information Integration
All computer-based applications embed a model of the real world, in which the concepts of the users are modelled as a system of data (the static model) together with computational behaviours which enable new facts to be deduced (the dynamic model). The behaviours allowed on the information are restricted by the real world meaning of the information. For example, it is valid to add the costs of line items of an invoice to calculate the total cost, whereas it is not valid to add the ages of a set of people to calculate “the age of a department”.
Information integration is concerned with transferring information using the data model (the static model), so that the receiving system can apply the correct computational behaviours (the correct dynamic model). This means that the facts deduced are correctly understood by the users. For example, when a 3-D CAD model is transferred from a designer to a manufacturer, the manufacturer can make the part required.
Information integration is different from display integration, in which end users use their own application to read another's information. Display integration does not allow the information applications to deduce new facts, only to display existing facts. The first generations of the World Wide Web allowed people to display information from anywhere in the world, but it was not possible for the Web Browser to calculate anything from that information.
This invention enables information integration, rather than display integration.
Information Integration Standards
Information integration between organizations has historically been based on a stack of information standards, including the following, in computer-based systems:—                A data encoding standard, such as ASCII or Unicode, which defines the alphabet used when transferring data.        A syntactic standard, which identifies the “words” of the exchange and their syntax, such as the STEP part 21 (ISO 10303-21) for file exchange or XML for transfer via a web server.        The semantic standard, which defines what each data element means, such as the STEP application protocols (ISO 10303-201 to 299), or an XML schema. For example, in an entity address it will identify the field country as holding that part of the address which states the name of the country.        Reference data standards, such as the standard lists of abbreviations for countries, whose elements may form part of the content of an exchange.        
In general, each standard is independent of each other, so one standard in the stack can be replaced with an equivalent without needing to replace the others above and below it.
Standards developed by computer oriented groups (e.g. computer manufacturers) focus on syntactic standards, since these enable any user to transfer information between software systems. However, in order to work together, end users need semantic standards and reference data, since they require to understand what the data means. By analogy, a telephone company may be proud that its system allows a user to ring up someone in China, whereas that user would like to be able to understand the person that user is talking to.
This invention enables the integration at the level of semantic standards, and does not rely on the use of any particular syntactic or data encoding standard.
Hierarchical Systems of Reference Data
The simplest known form of hierarchy is the tree, which is a form of directed graph. In mathematical theory, a graph is a set of nodes, with lines connecting the nodes. In a directed graph, the connecting lines have an associated direction, from one node to another. In a tree, there is a single node—the root node—which has no line going in to it, although it has several going out. All other nodes have exactly one line going in, and zero or more going out. The nodes which have no lines going out are called leaf nodes. A consequence of these constraints is that the tree appears to branch out from the root, and there is a path from the root to every leaf. By convention, the root is shown at the top of the diagram. An alternative naming convention has the node at the start of the line called the parent, and the node at the end called the child. In a tree, every node except the root has exactly one parent, and may have multiple children. The root is the ancestor of every node.
A hierarchy can be viewed as the merger of multiple trees. It therefore can have multiple roots, and a node can have multiple parents. However, the rules of a hierarchy forbid looping back, so that no node can be a parent of a node above it in the hierarchy (if it could, it could potentially be its own ancestor).
In a reference data hierarchy, the nodes carry terms, and the connecting lines indicate a subclass relationship, so that the child node is always a subclass of its parent. That is, any valid deduction (or computational behaviour) applicable to the parent is also applicable to the node itself. For example, if a car is defined as a “self propelled land vehicle capable of carrying passengers”, and it is asserted that a mini is a type of car, then it can be deduced that a mini is a “self propelled land vehicle capable of carrying passengers”. Hence, to understand a node, one needs not to have prior knowledge of the existence of the node, but instead one only needs to be given the name of the subclass and its parentage, and then it is possible to use the subclass effectively. This is also called inheritance, in which the child node inherits the properties of the parent nodes.
A formalization of this concept in the world of Artificial Intelligence is the ontology. Although this technique and the supporting tools provide a useful basis for implementation, it is to be appreciated that this formalism is not an intrinsic part of the invention.
Historical Experience with Information Integration
Information integration is generally a two step process. The first step is to agree a common semantic information model. The second step is to implement the model.
Historically, as the first step, the entire semantic model—the data model and the reference data—has had to be completely agreed for meaningful integration to take place. Anything outside this agreement has had to have been ignored by participating systems. The second step then takes several months typically as the required software is implemented and tested, and complex exchanges can take more than a year's testing typically before they can be used in production.
For the purposes of this specification, the term “integration” refers to implementation through a data exchange, transaction or data sharing mechanism. In a data exchange mechanism, data integration occurs through the transfer of a complete package of information, for example the sending of an electronic maintenance manual. In a transaction mechanism, data integration occurs through the transfer of a (coherent) subset of the information, for example updating of a maintenance manual by sending the new estimated man-hours for a task. In data sharing, multiple application read and update a single source of data, such as a database. The invention is applicable to all such methods of implementation.
The latest generation of standards has used new technologies to allow the reference data to be structured hierarchically. In the first step of integration, the whole semantic data and the upper levels of the reference data hierarchy must be agreed in advance. The details of the subclasses of the reference data can be deferred until later, allowing the implementation to start before the reference data is complete. However, the reference data must be complete for the final stages of software implementation and testing to go ahead.
In the invention, only the framework of the semantic data model and the upper levels of the reference data hierarchy must be agreed in advance. The invention enables the extension of the semantic data model and the reference data hierarchy and the addition of new computational behaviours after software implementation has been completed, because the software reconfigures itself to deal with the extended functionality.
Historically, once the semantic data model and reference data have been agreed and the software implemented, any change to the semantic data model requires every implementation to be updated. In practice, the reimplementation phase can take several months. Consequently, unless all the parties are running exactly the same integration interface with the same versions of the semantic standard, it can take typically several months (or more) to create a new data integration environment, and several months preparation to update the environment.
In the latest generation of standards, the implementation of the semantic data model is independent of that of the reference data model. This means that in step two, although the initial implementation takes as long as before, provided the semantic data model is unchanged, changes to the reference data library can be made rapidly (within hours). Further, provided only one party extends the reference data, it has been acknowledged that by subtyping reference data, it becomes possible to extend automatically the scope of the data exchange process.
This invention takes the above described two stages further. First, the implementation step allows limited automatic extension of the semantic data model, the behaviours and the reference data. Secondly, the invention allows automatic negotiation of the scope of the exchange where changes are made by more than one party. That is, one particular benefit of the invention over known applications is that the set up time for the integration environment may be significantly reduced from months to minutes. Another particular benefit of the invention is that the updating of the integration environment can be done incrementally with parties changing at different times.
Data Models v. Instances
A data model is a description of the types of data a system may hold. For example, a date may have a three character field for the month. An instance of date may have the value ‘May’ in the month field. A system will generally have many instances of date.
In general, the data model is expressed as part of the software in a system, whereas the instances are the values held by the system in its data files or database.
The reader unfamiliar with data modelling should keep this distinction in mind, particularly since, although all systems operate on instances, by convention, the types of operation are described in terms of the data model.