Digital watermarking of content is very well known. The content may comprise any type of information, and may include one or more of audio data, image data, video data, textual data, multimedia data, a web page, software products, security keys, experimental data or any other kind of data. There are many methods for performing digital watermarking of content but, in general, they all involve adding a watermark to an item of content. This involves embedding, or adding, watermark symbols (or a watermark codeword or payload data) into the original item of content to form a watermarked item of content. The watermarked item of content can then be distributed to one or more users (or recipients or receivers).
The method used for adding a watermark codeword to an item of content depends on the intended purpose of the watermark codeword. Some watermarking techniques are designed to be “robust”, in the sense that the embedded watermark codeword can be successfully decoded even if the watermarked item of content has undergone subsequent processing (be that malicious or otherwise). Some watermarking techniques are designed to be “fragile”, in the sense that the embedded watermark codeword cannot be successfully decoded if the watermarked item of content has undergone subsequent processing or modification. Some watermarking techniques are designed such that the difference between the original item of content and the watermarked item of content is substantially imperceptible to a human user (e.g. the original item of content and the watermarked item of content are visually and/or audibly indistinguishable to a human user). Other criteria for how a watermark is added to an item of content exist.
Digital forensic watermarking is increasingly being used to trace users who have “leaked” their content in an unauthorized manner (such as an unauthorized online distribution or publication of content). For this type of watermarking process, watermark codewords specific to each legitimate/authorized receiver are used. Each of the receivers receives a copy of the original item of content with their respective watermark codeword embedded therein. Then, if an unauthorized copy of the item of content is located, the watermark codeword can be decoded from that item of content and the receiver that corresponds to the decoded watermark codeword can be identified as the source of the leak.
However, even if we assume that the watermarking scheme itself is secure (i.e. the method by which the watermark codewords are embedded in the item of content and subsequently decoded is secure), there is still a powerful attack available against any digital forensic watermarking scheme: the so-called “collusion attack”. In this type of attack, a number of users, each of whom has his own watermarked version of the item of content, form a coalition. As the watermarked versions are individually watermarked, and thus different, the coalition can spot the differences that arise from the individual watermarks in their collection of watermarked items of content. Thus, the coalition can create a forged copy of the item of content by combining bits and pieces from the various watermarked versions that they have access to. A good example of this would be by averaging these versions, or by interleaving pieces from the different versions.
Watermarking schemes alone cannot counter a collusion attack. Instead, the best way to withstand collusion attacks is by carefully selecting the sequences of watermark symbols that are used to form the watermark codewords that are then actually embedded and distributed. Such constructions are known in the literature as “traitor tracing schemes” or “fingerprinting schemes”, and the watermark codewords are known as “fingerprint-codes” or sometimes simply “fingerprints”. An important feature of such a scheme is the length of its fingerprint-codes, contrasted against the number of colluding users it can catch.
Various classes of traitor tracing schemes exist in the literature. One classification of traitor tracing schemes distinguishes between so-called “static” traitor tracing schemes and so-called “dynamic” traitor tracing schemes. For static traitor tracing schemes, it is assumed that the initial distributor of the watermarked items of content generates a single fingerprint-code for each receiver and distributes these to the receivers (as a watermark embedded within the item of content). Then, when the unauthorized copy (the “forgery”) is found, a decoding/tracing algorithm is executed on that forgery to determine which receivers colluded to produce the forgery. This then ends the process. Static traitor tracing schemes are suitable for a single/one-off distribution of items of content (e.g. a single movie) to multiple receivers. In contrast, in a dynamic traitor tracing scheme, the distributor generates a fingerprint-code for each active/connected receiver and distributes the fingerprint-codes to these receivers (as a watermark embedded within an item of content). Then, when an unauthorized copy (the “forgery”) is found, a decoding/tracing algorithm is executed on that forgery to try to identify one or more of the colluding receivers—if a member of the coalition is detected, then that receiver is deactivated/disconnected (in the sense that the receiver will receive no further watermarked items of content). Then, further fingerprint-codes are distributed to the remaining active/connected receivers (as a new watermark embedded within a new/subsequent item of content). The process continues in this way until all colluding receivers have been identified and disconnected. This may be viewed as operating over a series of rounds/stages, or at a series of time points, whereby at each stage the distributor will have more information on which to base his detection of colluding receivers and possibly eliminate one or more of those colluding receivers from subsequent rounds. This is suitable for scenarios in which a series of items of content are to be distributed to the population of receivers.
Another classification of traitor tracing schemes distinguishes between so-called “probabilistic” traitor tracing schemes and so-called “deterministic” traitor tracing schemes. A traitor tracing scheme is deterministic if, when the associated tracing algorithm identifies a receiver as being part of the coalition of receivers, then there is absolute certainty that that receiver's watermarked item of content was used to form the forgery. In contrast, a traitor tracing scheme is probabilistic if, when the associated tracing algorithm identifies a receiver as being part of the coalition of receivers, there is a non-zero probability (a so-called false positive probability) that that receiver's watermarked item of content was not actually used to form the forgery, i.e. that the receiver was not part of the coalition. A deterministic traitor tracing scheme will therefore never accuse any innocent receivers of helping to generate the forgery; a probabilistic traitor tracing scheme may accuse an innocent receiver of helping to generate the forgery, but this would happen with a small false positive probability.
A problem with deterministic traitor tracing schemes is that the size of the alphabet that is required is large—i.e. when generating the fingerprint-code for a receiver, each symbol in the fingerprint-code must be selectable from an alphabet made up of a large number of symbols. In general, watermarking schemes are more robust against forgery (e.g. by averaging different versions) if the alphabet size is small. It would therefore be desirable to have a fingerprinting scheme that makes use of a small (preferably binary) alphabet.
Current probabilistic static traitor tracing schemes can operate with a binary alphabet. However, current static traitor tracing schemes are only guaranteed to identify one of the colluding users, but not necessarily more or all of them. A solution is to iterate the scheme to identify all the colluding users, but this requires lengthy fingerprint-codes. It would be desirable to have a fingerprinting scheme that is guaranteed (at least with a certain probability) to identify all of the users who form the coalition generating forgeries and that has short fingerprint-codes.
Furthermore, current static traitor tracing schemes assume that the number of receivers who form the coalition is known beforehand (or at least an upper bound can be placed on this number). Then, when using static traitor tracing codes, this means that (in retrospect) often an unnecessarily long codeword has been used. For example, if only two people actually colluded, but one tailored the codeword to catch ten colluders, then the codewords and tracing time could be about 25 times longer than was actually necessary in that case.
Thus, current traitor tracing schemes are unable to trace any number of colluding receivers (i.e. a number that is unspecified in advance), with a small (i.e. practical) number of required watermarking symbols (i.e. a small alphabet), in a relatively short time with relatively short codewords, whilst ensuring that all the colluding receivers can be identified.