In the vernacular of database technology, a “dataset” is a collection of related data or information and their relationships, that are organized and treated as a unit. One illustrative dataset is the data collected by a given sensor or collection of sensors. Another illustrative dataset is the collection of database entities (e.g., objects in an object-oriented database) related to a given task. In the context of configuration management, a dataset may be described as a collection of data, and their relationships, that together represent information from a given source. In this environment, a dataset could be the collection of configuration items, and their relationships, obtained from a given network discovery source.
One of ordinary skill in the art of database and/or configuration management will recognize that, for various reasons, it is sometimes useful to have a second dataset based on a given (first) dataset, wherein one or more characteristics of the second dataset are changed with respect to the first dataset without affecting the first dataset. In the prior art, second datasets are provided by either copying or versioning an original dataset. Copying is expensive both in terms of the time it takes to copy all instances of a dataset and in terms of the storage required to retain the duplicate information (especially for large datasets). Copying also has the drawback of loosing the connection between the instances in the two datasets so that the two environments (i.e., the first and second datasets) can start diverging almost immediately—especially when the operational environment is dynamic. It will also be recognized that copying suffers from a scalability problem. Versioning creates copies of data instances (e.g., entities or objects) as they are changed, establishing a version for each new copy. Different datasets can then be created post hoc by gathering together those configuration items with specific version tags or based on a time. A significant drawback to versioning is its lack of flexibility—it is difficult to have multiple parallel copies of a common dataset, each with its own (typically small) perturbations.
Thus, it would be beneficial to provide a mechanism whereby a second or overlay dataset could be specified that is a “duplicate” of a first dataset except for one or more specified changes that avoids or mitigates the noted drawbacks to prior art duplication techniques (e.g., copying and versioning).