This application is directed to watermarking relational databases, such as numeric and categorical relational databases. It also may be applicable to watermarking digital data in general.
The primary purpose of digital watermarking is to protect certain content from unauthorized duplication and distribution by enabling provable ownership over the content.
Digital watermarking has traditionally relied upon the availability of a large noise domain within which the object can be altered while retaining its essential properties. For example, the least significant bits of image pixels can be arbitrarily altered with little impact upon the visual quality of the image as perceived by a human. In fact, much of the “bandwidth” for inserting watermarks, such as in the least significant bits, is due to the inability of the human sensory system to detect certain changes. However, while a considerable amount of research effort has been invested in the problem of watermarking multimedia data such as images, video, and audio, only limited research into watermarking numeric and categorical relational data has been done.
Protecting rights over outsourced digital content is of ever increasing interest, especially considering areas where sensitive, valuable data is to be sold or made directly accessible. For example, in data mining applications a set of data is usually produced and collected by a data collector and subsequently sold in pieces to parties specialized in mining that data. Other applications may include online interactions in which data is made available for direct, interactive use. Given the nature of most of the data, it is difficult to associate rights of the originator over it. Enforcement by legal means is usually ineffective in preventing theft of copyrighted works, unless augmented by a digital counterpart, such as watermarking.
Since a watermark modifies the item being watermarked, if the object cannot be modified then a watermark cannot be inserted. Thus, it is desirable to limit the change to acceptable levels with respect to the intended use of the data. However, one can always identify some use of the data that is affected by even a minor change to any portion of the data. Because of the nature of databases, they present unique challenges in limiting the change caused by a watermark to acceptable levels. One cannot rely upon “small” alterations to the data in the embedding process as any alteration is necessarily significant. Hence, the discrete characteristics of the data require fundamentally new bandwidth channels and associated encoding algorithms.
Moreover, in order to be effective, the watermarking technique must be able to survive a wide variety of attacks, for example subset selection, subset addition, subset alteration, and subset resorting. In subset selection, the attacker may randomly select and use a subset of the original data set that might still provide value for its intended purpose. In subset addition, the attacker adds a set of numbers to the original set. The addition of more numbers is not intended to significantly alter the useful properties of the initial set versus the resulting set. In subset alteration, a subset of the items in the original data set is altered such that there is still value associated with the resulting set. An example of subset alteration is linear transformation performed uniformly to all of the items. Such a transformation preserves many data-mining related properties of the data while actually altering it considerably. In subset resorting, the subsets are rearranged to attack a watermark dependent upon a predefined ordering.
In addition to the attacks identified above, possible attacks on categorical relational data also may include horizontal data partitioning, vertical data partitioning, or attribute remapping. In a horizontal data partitioning attack, an attacker can randomly select and use a subset of the original data set that might still provide value for its intended purpose. In a vertical data partitioning attack, an attacker selects a valuable subset of the attributes by vertical partitioning. Finally, data semantics permitting, remapping of the relation attributes may represent a powerful attack if the attacker finds at least a partial value-preserving mapping from the original attribute data domain to a new domain.
Accordingly, there is a need for protecting relational data, including both numeric and categorical relational databases, via resilient watermarking.