The security of digital data has been a great concern since the expanded use of these data over the Internet. Because digital data allow unlimited number of copies of an “original” without any quality loss and can also be easily distributed and forged, this presents problems of copyright protection and tamper detection, creating a pressing need for digital data protection schemes.
A number of technologies have been developed to provide data protection including cryptography and steganography. Cryptography protects data by encrypting them so that the encrypted data are meaningless to an attacker. However, once the encrypted data are decrypted, the data are in the clear and are no longer under protection. On the other hand, steganography conceals the very existence of the data by hiding them in cover data. The problem is that it cannot extract the hidden data if the stego data undergo some distortions.
A new emerging technology, digital watermarking, complements cryptography and steganography by embedding an invisible signal directly into the data, thus providing a promising way to protect digital data from illicit copying and manipulation. After embedding, the watermark and the data are inseparable. There is a wide range of applications of digital watermarking including copyright protection, authentication, fingerprinting, copy control, and broadcast monitoring, etc. For different kinds of applications, digital watermarking should satisfy different properties. The tamper detection problem is of particular interest. For this kind of application, digital watermarking should have properties such as invisibility, fragility, high detection reliability, etc.
Digital watermarks may be classified into two categories based on their application: fragile watermarks for tamper detection and robust watermarks for ownership verification. In the last few years, research on fragile watermarking for multimedia data, such as images, audio, and video has been extensively conducted. Recently, some researchers began to realize the importance of watermarking databases and proposed some robust watermarking schemes designed to protect the copyright of a database relation. Though it may be important to verify the source or owner of a database relation, in some cases, it may also be critical to ensure the authenticity of database relations. This is of increasing interest in many applications where database relations are publicly available on the Internet. For example, in the application of database outsourcing, owners of databases, who do not have sufficient resources to maintain the applications, store their databases on servers provided by external application service providers so that the owners can focus on their own core tasks. The application service providers may provide data processing service to clients on behalf of the owners. Since service providers may not be trusted, the database owners may need to take responsibility for ensuring the integrity of outsourced databases. Similar applications include edge computing and data dissemination etc.
Unfortunately, despite the importance of tamper detection for database relations, this problem has not been adequately addressed. Although some digital signature based schemes have been proposed to address this problem, they can only detect, but not localize, the modifications. Thus, like fragile watermarking for multimedia data, it is desirable to have a fragile watermarking scheme for database relations, so that any modifications made to a database relation can be not only detected but localized as well. This is especially useful for a very large database relation where the rest of the relation can still be trusted and used after some tampered tuples are detected and localized.
Embedding watermarks in database relations is a challenging problem because there is little redundancy present in a database relation. One important property of digital watermarks is invisibility. Usually, in a watermarking scheme, a watermark is embedded by “slightly” modifying the cover data. To ensure invisibility, the modifications are limited to some acceptable level. This requires that the cover data can tolerate these modifications. In the context of multimedia data, this requirement is not a problem. Since multimedia data are highly correlated, there is a lot of redundant information present in multimedia data. Although compression techniques can remove some of the redundant information, currently, no compression technique is perfect enough to remove all the redundant information. This leaves room for watermark embedding. A watermark can be embedded as a part of the redundant information without affecting the quality of the multimedia data. Furthermore, some properties of the human vision (auditory) system can be incorporated to the watermark embedding so that the strength of the embedded watermark can be adjusted adaptively. All of these make it easy to ensure invisibility for multimedia watermarking. In contrast, database relations contain large number of independent tuples. A tuple can be added, deleted, or modified without affecting other tuples. All tuples and all attributes are equally important. There is little redundancy present in the tuples. Thus, it is a challenge to embed an invisible watermark in a database relation.
In current robust database watermarking schemes, there is an assumption that all watermarked attributes are numeric and can tolerate small distortions. Although this assumption is reasonable for some kinds of database relations such as weather and measurement databases, in real life, we also need to deal with database relations which contain categorical attributes such as social security number, name, date of birth, etc. Obviously, these attributes cannot tolerate any modifications.
To the best of our knowledge, the only effort on watermarking categorical data is by Sion. (See R. Sion. Proving ownership over categorical data. In Proceedings of ICDE 2004, 2004). In his scheme, although only a small number of tuples are selected for watermark embedding, the categorical values of the selected tuples are modified to embed a watermark. Such modifications may be too significant to be tolerable. Besides, the Sion scheme is a robust watermarking scheme and is designed to protect the copyright of a database relation. A fragile watermarking scheme for categorical data is yet to be devised.
What is needed is a fragile watermarking scheme for detecting and localizing malicious alterations made to a database relation with categorical attributes without introducing distortions to cover data.