The present invention relates to digital preservation, and more specifically, to preservation aware fixity computations in a digital preservation system.
Long term digital preservation (“preservation” for short) is the ability to sustain the understandability and usability of digital objects in the distant future regardless of changes in technologies and in the “designated communities” that use these digital objects (that is, the data consumers). Digital objects include, but are not limited to, text documents, data files, audio/visual files and other types of information stores. The core standard for digital preservation systems is the Open Archival Information System (OAIS), which is an International Organization for Standardization (ISO) standard. OAIS specifies the terms, concepts and reference models for a system dedicated to preserving digital assets for a designated community.
One of the main concepts in OAIS is the Archival Information Package (AIP), which is the basic object stored in a preservation system. FIG. 1 shows an example representation of an AIP 100 according to the OAIS standard. The AIP 100 includes a content information compartment 102 and one or more preservation description information (PDI) compartments 104. For clarity, only one PDI compartment 104 is shown in FIG. 1.
More specifically, the content information compartment 102 includes content information in the form of a content data object 106. The content data object 106, is the raw data that is the focus of the preservation. The content information compartment 102 also includes representation information 108 (RepInfo) which is needed to render the object intelligible to its designated community. This may include information regarding the hardware and software environment needed to view the content data object 106.
The PDI compartment 104 includes additional metadata focused on describing the past and present states of the content information 102, ensuring it is uniquely identifiable and that it has not been altered in an undocumented manner. In particular, the PDI compartment 104 includes a reference field 110 that contains identifiers for the content information. At least one of these identifiers should be globally unique and persistent.
The PDI compartment 104 also includes a provenance field 112 that documents the history and the origin of the content information and any changes that may have taken place since it was originated. Provenance information also documents who has had custody of the content information since it was originated. The PDI compartment 104 also includes a context field 114 that documents the reasons for the creation of the content information and relationships to its environment. The PDI compartment 104 also includes a fixity field 116 that demonstrates that the particular content information has not been altered in an undocumented manner. The term “fixity” may also be referred to as an integrity check. In addition, the PDI compartment 104 includes a PDI representation field 118, this may include information regarding the hardware and software environment needed to view the information stored in the PDI compartment 104.
Most prior art has related to preservation only and does not deal with fixity computations. Other related work discusses fixity but it is not preservation-aware. If the fixity computation is not preservation-aware the fixity computations may become obsolete as time passes.