The widespread transition of data from analog format to digital format has exacerbated problems relating to unauthorized copying and redistribution of protected content. Flawless copies of content can be easily produced and distributed via the Internet. This piracy is a major concern and expense for content providers.
Further, a new type of home consumer device for digital content management has been enabled by the advent of inexpensive, large-capacity hard disks. A movie rental box receives digital movies from some inexpensive source of data, usually a broadcast source (whether terrestrial or satellite-based). The movies do not have to be delivered in real time. Instead, they are stored on the hard disk, so that at any moment the hard disk contains, for example, the hundred hottest movies in the rental market. The consumer can simply select a particular movie and hit “play” to begin viewing a movie. The movie rental box periodically calls a clearing center and reports the consumer's content usage for billing purposes; the box may also acquire new decryption keys during this call.
The advantages the box provides to the consumer are obvious: he or she no longer has to go to the video rental store, and perhaps more importantly, does not have to return a rental tape or DVD. The consumer value proposition of movie rental boxes is so compelling it is estimated that there will be 20 million such boxes in the United States within five years.
Content providers need to know what security problems are associated with these boxes, i.e. how can a user get a movie without paying for it? The simple attack of merely disconnecting the box so that it cannot call the clearing center can achieve only a short-lived advantage because the clearing center can simply refuse to provide new decryption keys to such a box. Likewise, the periodic “calling home” makes detection of clone boxes relatively easy. The most serious attack is likely to be the so-called “anonymous” attack, wherein a user or a group of users purchase rental movies from legitimate movie rental boxes that have been instrumented so that the protected content and/or the decryption keys can be captured and redistributed, often over the Internet. This Napster-style attack with movies instead of music is the most urgent concern of the movie studios that are investigating content protection technology.
One solution to the problem is to differently watermark and differently encrypt each movie for each authorized movie rental box, so that if a movie were pirated the watermarking and encryption information would uniquely identify the compromised box. Alas, this solution is not feasible because of the excessive computing effort and transmission bandwidth required to prepare and transmit individualized movies. The distribution system is economical only if the movies can be distributed over broadcast channels, i.e. where every box gets substantially the same data at the same time.
To solve the broadcast problem, the approach known in the art as “tracing traitors” is used. In this approach, an original version of each movie file has been augmented before being broadcast. Specifically, the file that is actually broadcast has had at least one critical file segment replaced by a set of segment variations. Each file segment variation is differently encrypted and preferably also differently watermarked prior to encryption, although the entire file may be watermarked as well. All the variations in one segment are identical for viewing purposes. A receiver is given the cryptographic key to decrypt only one of the variations in each segment. If the receiver is compromised and is used to illegally rebroadcast either the keys or the segments themselves, it is possible to deduce which receiver or receivers have been compromised.
The traitor tracing approach has not been widely used in practice to date, because previously known methods required unreasonable amounts of bandwidth in the broadcast, due to the number of segments or variations required. This limitation is addressed by the invention “Method for Tracing Traitors and Preventing Piracy of Digital Content in a Broadcast Encryption Medium”, U.S. Ser. No. 10/315,395, filed on Dec. 9, 2002 and published Jun. 10, 2004 as U.S. Patent Application Publication 2004/0111611A1. This invention, referred to hereafter as the '395 invention, substantially reduces the bandwidth required. FIGS. 1 through 7 and related description are taken directly from the '395 invention specification. The present invention is perhaps better understood in view of the '395 invention, but is not limited to use with that invention.
Referring now to FIG. 1, a prior art diagram of an original file 100 is shown. Files may comprise any kind of digital data sequence, including but not limited to text, audio, images, video, music, movies, multimedia presentations, operating systems, software applications, and cryptographic keys. In broad terms, file 100 includes a beginning 102 and an end 104 and a span of data. Files 100 may be of any size and may be distributed by any means, including but not limited to computer networks, satellite networks, cable networks, television transmissions, and various physical storage media (e.g. CD-ROMs, DVDs, tapes, etc.) as are known in the art. Files 100 may be broadcast in groups in a substantially continuous sequence, for example, when a movie rental box's stored content of say 255 movies is updated, perhaps on a monthly basis. In the movie rental box scenario, among others, files are usually not encrypted and otherwise processed on the fly, but are processed ahead of time.
The '395 invention is not limited to the movie rental box implementation, but instead can be applied to any digital content subject to one-to-many distribution. For example, operators of a web server (generally referred to as a digital rights manager) that sells copyrighted content such as music or other material stored in a subscription database may not want to encrypt or otherwise process files on the fly because of the computational expense involved. Similarly, such a server cannot feasibly individually tailor nor store a complete copy of every file it transmits.
Another application of the '395 invention is to prerecorded optical discs (DVDs). In fact, the '395 invention has been adopted by the Advanced Access Content System, the content protection system for the new generation of high-definition (blue laser) DVDs. In this case, AACS anticipates 1 billion devices being manufactured incorporating this invention over the life of the technology.
Referring now to FIG. 2, a prior art diagram of critical file segments 202, 204, and 206 in an original file are shown. For clarity, only three critical file segments are shown; the preferred number is approximately 15. Not all data in a file 100 needs to be protected to the maximum possible level of security; bandwidth can be conserved by selectively applying different levels of security to the most valuable portions of a file 100. For example, in terms of the movie rental box scenario, each movie may have scenes that are each absolutely essential for the movie to be acceptable to any audience. All critical file segments in a file must therefore be properly processed for the file to be commercially desirable. The '395 invention preferably selects five-second scenes in a typical movie as critical file segments, but critical file segments of varying length are also encompassed by the '395 invention. The critical file segments are not necessarily equally distributed throughout a given file, in fact the critical file segments are preferably especially selected based on the contents of the file, possibly by human editors. In the case of executable software files, automated tools may identify critical file segments according to a measured execution frequency.
Referring now to FIGS. 3A, 3B and 3C, prior art diagrams of file segment variations 302-324 that replace critical file segments 202-206 are shown. For clarity, only four file segment variations are shown for each critical file segment; the preferred number is approximately 16. Each file segment variation is simply a copy of the particular corresponding critical file segment that has been differently watermarked and differently encrypted. Each entire file is also typically watermarked and encrypted in a broadcast encryption system. Each file segment variation is identified by a text designation in this application (e.g. A, B, C . . . etc.) for clarity, but in practice binary numbers are generally employed for this purpose.
The number of critical file segments and the number of file segment variations preferably employed depends on the properties of the file and its audience. For movies, one could select a single critical file segment and have several hundred file segment variations; however, attackers might simply choose to omit that single critical file segment in a pirated copy of the file, in hopes that viewers would not find such a glitch to be overly annoying. A pirated movie with say 15 missing critical 5-second scenes is probably going to be too annoying to any viewer for it to be of any commercial value. Thus, the illegally broadcast movies are either substantially disrupted or the attackers must incorporate some of their file segment variations, which will facilitate traitor tracing.
While the number of critical file segments and the number of file segment variations may be kept constant for each file, modifying either number according to an estimated piracy likelihood for a given file is also within the scope of the '395 invention. The number of file segments and the number of file segment variations will determine the amount of bandwidth overhead (or, alternately, the increased size of the broadcast version of the file). In a typical movie, use of 15 critical file segments each having 16 file segment variations each of 5 seconds' duration adds roughly 10% to the file size.
Referring now to FIG. 4, a prior art diagram of an augmented file 400 including file segment variations 302-324 is shown. The augmented file 400 is the version of the original file 100 that will actually be broadcast. Each intended receiver of the broadcast of a group of files requires augmentation selection information to choose a particular combination of file segment variations for each particular file. In terms of the movie rental box scenario, each movie rental box must know, for each movie, which set of variations to plug into the spaces where critical scenes existed in the original movie. The particular arrangement of unmodified file content and file segment variations within the augmented file 400 shown is not critical but is merely intuitive.
The augmentations employed by the '395 invention facilitate traitor tracing in a commercially viable (i.e. low bandwidth overhead) manner. If a pirated version of a file is found, say on the Internet, the identity of the particular movie rental box (or boxes) that were used to create the pirated version is of keen interest to the broadcaster and/or content creator (e.g. copyright owners). The broadcaster and/or content creator may institute legal proceedings against the culprit, and would certainly want to refuse to send new decryption keys to the compromised boxes to prevent future thievery. If different boxes are assigned different combinations of file segment variations to use, an analysis of a pirated file can help determine which boxes were used as part of an anonymous attack.
In the event that all of the file segment variations in a redistributed version of a file match the combination of file segment variations assigned to only a single movie rental box, prior art systems would normally identify that box as being the source of the redistributed file. However, attackers are becoming increasingly sophisticated and may choose to employ a number of boxes to produce a pirated version of a file via collusion, wherein each box contributes some information or content used to produce the illicit copy after enough such information or content has been accumulated. From the attackers' point of view, the ideal situation is if they redistribute movies including variations such that an innocent third party appears to be the culprit. Such redistribution may not occur right away, but may follow a so-called “delayed attack”. This complicates the task of traitor tracing, and emphasizes the need to prevent all attacks as much as possible for every broadcast. In the '395 invention, the watermarks in the file segment variations are used to determine which variations have been rebroadcast.
Therefore, the '395 invention performs two complimentary tasks: choosing which file segment variation to employ at each critical file segment of each file for each receiver box, and upon observing a redistributed file or decryption keys, identifying (and preferably subsequently disabling) traitors with the assistance of variation assignment information. The '395 invention can detect a larger number of colluding attackers for a given bandwidth than any known solution. It is literally an order of magnitude better than some naive schemes that have been suggested.
Referring now to FIG. 5, a prior art flowchart of the method of assigning super codes is shown. The super codes serve both as augmentation selection information to enable proper processing of files, and as traitor tracing information. The super codes preferably comprise an inner code and an outer code that operate in a nested manner. In step 502 a maximally different inner code codeword is created for each critical file segment variation in each file, as described in more detail below. An inner code codeword describes which combinations of file segment variations should be selected by a particular receiver. Note that at this point the exact location of each critical file segment in each file and its contents may not have been determined, though codewords are selected. Then, in step 504 each file in a group of files is assigned a file identifier according to a maximally different outer code codeword, also to be described below. An outer code describes which inner code codeword is pertinent to a given receiver in each file. The inner code and the outer code are selected (by error correcting codes, preferably Reed-Solomon codes) to each be maximally different, to reduce the likelihood of a group of receivers having identical augmentation selection information. Each group of files typically has a different super code.
The assignment of inner codes may vary randomly so that the pattern of file segment variations employed is not repeated from one broadcast to the next. Similarly, the assignment of outer codes in each broadcast may also vary randomly so that the pattern of files identified in a group is not repeated from one broadcast to the next. Alternately, the assignment of inner code and outer code may be varied according to the need to identify suspected traitor receivers as certainly as possible. Further, while the number of critical file segments and file segment variations may be kept constant for simplicity, the number of critical file segments and the number of file segment variations may be varied according to an estimate of how likely it is that a given file will be pirated.
Referring now to FIG. 6, a prior art flowchart of the method of preparing files for transmission is shown. For each file, at least one critical file segment (as shown in FIG. 2) is selected in step 600. For each critical file segment, at least one file segment variation (as shown in FIGS. 3A, 3B, and 3C) is created in step 602 to replace each critical file segment, forming an augmented file (as shown in FIG. 4). In step 604, the group of augmented files is broadcast. Finally in step 606, each group of files is assigned to one receiver box via a super code and a new set of decryption keys provided to the authorized receivers. The super code determines the assignment of decryption keys to each receiver, i.e. each receiver acquires decryption keys only for the particular file segment variations that will be used by that receiver.
The '395 invention treats the assignment of variations as a coding problem, instead of merely randomly choosing the variations for each box. In other words, when assigning error correcting codes one wants each codeword to be maximally different from every other codeword. Unfortunately, some error correcting codes are impractical because they require many more variations than are allowed by the real-world available bandwidth constraints. The '395 invention avoids the bandwidth problem by having a small number of variations at any single point by nesting two small codes to form an overall or super code. Combinations of file segment variations in each file are assigned according to an inner code. In terms of the text labels describing the file segment variations 302-324 in this application, the inner code that describes which file segment variations in augmented file 400 might be <AFL> for example, indicating that file segment variation 302 should be selected instead of file segment variations 304, 306, and 308, that file segment variation 312 should be selected instead of file segment variations 310, 314, and 316, and that file segment variation 324 should be selected instead of file segment variations 318, 320, and 322.
A file identifier that describes which combination corresponds to which file in a group of files is assigned according to an outer code. For example, the inner code <AFL> might apply to file number 123. The '395 invention preferably employs Reed-Solomon codes, but all coding methods are within the scope of the '395 invention.
For example, using a Reed-Solomon inner code for 15 critical file segments each having 16 file segment variations, there are 256 different codewords assigned to boxes. Because of the properties of the code, that means that if one picks any two boxes, the boxes will either have an identical assignment of file segment variations, or at least 14 out of the 15 points will have different variations.
Using a Reed-Solomon outer code for a group of 255 files, for example, there are 256 different codewords assigned to file identifiers. Thus, if there are 16 million boxes, each assigned to a unique super code, each box will have the same inner code assignment in at most two files. Any two boxes will differ in the inner code assignment in at least 253 files, and in each one of those, they will have at least 14 different points, so the difference between the two boxes spans at least 253×14 or 3542 variations.
Referring now to FIG. 7, a prior art flowchart of the method of identifying and disabling traitor receivers is shown. First, in step 700, a redistributed or pirate version of a file is examined to determine its augmentations, which include the particular file segment variations it contains. Next, in step 702, a comparison is made between the augmentations and the super codes previously assigned to authorized receivers, to determine which receiver (or receivers) are most likely compromised. Finally, in step 704, when a traitor is traced, the dynamic subset of users authorized to receive the broadcast is changed by simply dropping the traced traitor from it; legal action may also be instituted at this point.
The examination includes calculating, preferably for each box, the number of file segment variations that a box matches with each observed illicit file. The examination can reveal a single movie rental box having every assigned file segment variation that was used in the pirated movie and reveal that the watermarks used also match, for a deterministic identification of the traitor. While prior art systems try to determine the traitor as quickly as possible by analyzing a single file, in the case where attackers are colluding this approach doesn't adequately distinguish the culprits. Instead, with the super code design of the '395 invention there may be thousands of boxes that will have exactly the same variations for a given file (as determined by the inner code), but these boxes will be distinguished in subsequent movies via the outer code. Since an attack is only economically hurtful if the attackers rebroadcast many movies, the approach is exactly right. A single group of movies, corresponding to a single super code, can be sufficient to identify a group of colluding traitors.
Further, the comparison may include a count of the number of watermarked file segment variations in the pirated file corresponding to each box among a number of boxes collectively compromised by colluding attackers. A ranked list of boxes can be generated according to the number of each box's file segment variations used in the pirated file. The box that has the most matchings with the redistributed movie is incriminated, and will not be given any new decryption keys. In other words, a list of suspected traitors can be generated according to the number of file segment variations from each that are used in an illicit copy. Thus, even when the suspected traitor numbers become too big for a deterministic identification, the '395 invention can probabilistically identify and disable the compromised box without harming innocent users. The present invention is a substantial improvement on this probabilistic identification.
It is also within the scope of the '395 invention to exclude more than one box for each super code sequence. In other words, the broadcaster can exclude the top two boxes, or the top three boxes, etc. This defeats the attack sooner, but at the cost of increasing the chance of falsely incriminating an innocent device along the way. Of course, there might be non-technical ways to help tell the difference between innocent and colluding boxes. For example, if a consumer calls to complain that her box no longer works, and is willing to have a service man come to her house to fix it, she is likely to be innocent.
The method is repeated for the next super code group of files, e.g. the next group of 255 movies. Eventually the attack will stop because all compromised boxes will have been excluded.
The attackers should not be able to calculate the actual assignments for any boxes but their own; if they could, that might help them incriminate an innocent box. Therefore, an additional feature of the '395 invention is to randomly permute code assignments at each code position (each critical file segment in the movie), and in each movie itself. For example, if a Reed-Solomon code would suggest that a given box should get variation #1 at a certain point in a certain movie, the assignment of variation number to the actual broadcast order will have been permuted, so that variation #1 is rarely the first variation broadcast.
If there are a large number of colluding boxes (e.g. dozens), it may be difficult to condemn any single box after the first 255 movies have been broadcast. It is a simple matter to continue the process with the next group of movies. However, it is probably a bad idea to make exactly the same assignment of boxes to codes in the new group of movies, because then the same innocent box will have a high overlap with the traitors. It is an additional feature of the '395 invention to change the assignment of the super code to boxes after each super code sequence. All such new assignments are within the scope of the '395 invention, including random assignments and code assignments that are calculated to correspond to particular boxes to more effectively identify suspected traitors.
The best super code is generated when the inner code has k=2. This well-known parameter of error correcting codes determines the number of codewords; if q is the number of variations at each point, the number of codewords is qk. All values of the k parameter are nonetheless within the scope of the '395 invention.
The various traitor tracing schemes envisioned in the past have all focused on evaluating the likelihood that particular individual receivers are traitors, typically by computing a score based on the number of file variations that are in common with recovered pirated content files. Numerous simulations have revealed that this prior art “high score” method has flaws: completely innocent receivers often obtain high scores due to chance alone.
It is not hard to see why this is the case. Take, for example, the case of twenty pirate files recovered from a coalition of four receivers, with each file having one of 256 variations. The highest scored receiver must have a score of at least “5”, meaning it has at least 5 variations in common with the recovered sequence of files. The problem is, assuming there are 1 billion receivers in the world, on average 15 innocent receivers would score “5” or better on a completely random sequence of movies. Thus, in this attack, the prior art “high score” method would almost never reliably incriminate even one of the actual guilty receivers. This method would also never reveal the actual number of receivers involved in the attack.
Thus, an improved method to reliably detect guilty receivers is therefore needed. In addition, since in real world scenarios the actual number of attackers is rarely known to the licensing agency, a method that could deduce the actual number of receivers involved is highly desirable.