A metadata repository is a database of data about data (metadata). One purpose of a metadata repository is to provide a consistent and reliable means of access to data. A metadata repository can be stored in a physical location or may be a virtual database, in which metadata is drawn from separate sources. Metadata may include, for example, information about how to access specific data, or more detail about data.
The set of items stored within a metadata repository typically changes over time—e.g., items can be deleted, added, and modified. Such changes over time can cause one or more of the following problems: 1) it may be impossible to review older items for auditing—which is essential for anti-fraud investigations, Sarbanes-Oxley, and the like; 2) it may be impossible to review the timeline of a given item as the item is added, changed, and deleted—often essential for time-series analytics; 3) if quality problems emerge, it may be impossible to roll back with confidence to a known good state; and 4) if two or more users (User A and User B) change replicated copies if an item in parallel, a common historical baseline of the item is essential for merging algorithms, which generally compare a state of the item in the copy of User A, the item in the copy of User B, and the common historical baseline of the item.
Conventional metadata repositories typically store large chunks of structured metadata as opaque strings, blobs (binary large objects), or clobs (character large objects). For example, the repository might store units such as an entire XML Schema Definition (XSD) or Web Service Description Language (WSDL) file as a string, an entire ERWin file (.ER1 file) as a blob, or an entire Java file as a string. Storing time-safe histories of these requires replicating large blobs/clobs. However, this requires excessive storage and processing overhead. Moreover, it may be impossible to truly track the changes over time on a fine-grained level—e.g., which XSD complexType underwent a name change in a given XSD Schema.
Another technique used in conventional metadata repositories (and common to some Source Code Control Systems) is to track deltas—i.e., the units of text that have changed—between versions. Yet the deltas lack context within metadata structure. For example, a delta might specify that an element declaration was added to line 55 of an XSD, yet the delta cannot indicate that the element was moved from complexType “Person” to complexType “Address”. Yet another technique used in conventional metadata repositories is to log an action that changed an item, for example “element declaration “street” was deleted in CustomerFormat.xsd. But when the metadata is analyzed, users are typically interested in the state of the metadata, and not in the step-by-step chain of actions. The purpose of this patent is to solve these problems.