There are many applications in which a large amount of content is stored in a repository, with access to the data stored through a network such as the internet.
A data repository may take the form of a conventional database that stores content in records having a number of fields. In conventional databases, some of the fields are indexed so that data in the indexed fields is stored in a separate index. The separate index may be searched for specific search terms to identify records including those search terms.
The use of digital signatures is well known, in order to provide an indication that a data element originates from a genuine source. A digital signature of a message or data is a data object dependent on a secret key known only to the signer and on the content of the message or data being signed.
There are many known digital signature algorithms. A digital signature scheme can be with an appendix, so that the original message/data is input as part of the digital signature verification. Alternatively, digital signature schemes may be with message recovery, in which case the original message/data is recovered during the digital signature verification, for subsequent comparison with the unsigned message. A public key is used to read the signed message and thereby verify the authenticity.
The use of digital signatures for the content of data repositories is known. One common approach for performing digital signing of electronic content in document storage and archival systems involves the use of the entire content of the stored object in the calculation of a digital signature. This approach works well with the storage of data objects where there is no structure to the content of the object and where the only update process involved is through deletion or replacement of a whole object.
Storage of application data in a so-called “semi-structured” format has become common in archival storage devices, and this type of data does not meet the requirements outlined above.
So called “semi-structured” data has a structure which is not regular and does not have a fixed format. The data can quickly evolve. There is also a blurring between the structure and the data stored by the structure. The lack of a fixed schema and the lack of fixed information on the data structures makes the handling of such data difficult using conventional database technology.
Technologies are being developed to allow structure to be extracted from the data objects, with query execution techniques able to exploit this extracted structural information.
This type of data structure allows applications to define complex data objects through the use of semi-structured content. The data objects often consist of content placed in the data object on initial store coupled with annotations added later through a business process lifecycle. For example, a document publication and review process requires the ability to define and store the initial document, followed by the addition of annotations or notes during a review process, so that the content changes in data and in structure.
A semi-structured data store may for example comprise an Extensible Markup Language, XML, store. The stored data has a finer grained structure than the whole object. In this type of environment, updates may be made to an object that involve only specific properties or sub-elements in the archived object.
A digital signature prepared using the entire object does not then allow a third party to update the object. However, there may be instances when it is desirable for a third party to update sub-elements of a document without needing to apply a digital signature.