Modern Personal Computers (PCs) have evolved from strictly computing devices into communication and entertainment devices. As a result, users expect to be able to view prerecorded video entertainment, including feature length movies, on their PC. In addition, the increased performance of processors makes it look advantageous to use software on the PCs processor to, for example, decode and play DVD movies. However, the owners of entertainment intellectual property (e.g., copyrights in movies) rightly are concerned about unauthorized use and copying of their property.
Unfortunately, the very nature of software decode in an open system such as a PC means that content cannot be effectively protected in a conventional open system that decodes the content. At some point during the decode process, both the keys and the decrypted content (e.g., video content) are available within the registers and/or memory of the open system and unauthorized copies of the keys or content can made.
In accordance with the present invention, this problem is avoided by creating a “closed” subsystem within a PC (or other open system) which can protect both content (e.g., video and/or audio data) and keys. The expression “closed subsystem” herein denotes a subsystem that does not provide users a way to add hardware or software thereto or remove hardware or software therefrom. Similarly, the expression “closed system” herein denotes a system that does not provide users a way to add hardware or software thereto or remove hardware or software therefrom. In typical embodiments of the invention, a closed subsystem within a PC employs key data to encrypt content, and is designed to prevent the key data and the unencrypted content (the content in its pre-encrypted state) from being disclosed outside of the closed subsystem.
Standard DVDs provide protection of their content from unauthorized use via the Content Scrambling System (“CSS”), an encryption method. It is anticipated that high definition video will be disseminated as high definition DVDs (HD-DVDs), and that an encryption method similar to CSS will be used to protect the content of HD-DVDs.
The trend in the industry for sending high definition video to display devices is to deliver the data in digital form over serial links.
Various serial links for transmitting encrypted or non-encrypted data are well known. One conventional serial link, used primarily in consumer electronics (e.g., for high-speed transmission of video data from a set-top box to a television set) or for high-speed transmission of video data from a host processor (e.g., a personal computer) to a monitor, is known as a transition minimized differential signaling interface (“TMDS” link). The characteristics of a TMDS link include the following:                1. video data are encoded and then transmitted as encoded words (each 8-bit word of digital video data is converted to an encoded 10-bit word before transmission);        a. the encoding determines a set of “in-band” words and a set of “out-of-band” words (the encoder can generate only “in-band” words in response to video data, although it can generate “out-of-band” words in response to control or sync signals. Each in-band word is an encoded word resulting from encoding of one input video data word. All words transmitted over the link that are not in-band words are “out-of-band” words);        b. the encoding of video data is performed such that the in-band words are transition minimized (a sequence of in-band words has a reduced or minimized number of transitions);        c. the encoding of video data is performed such that the in-band words are DC balanced (the encoding prevents each transmitted voltage waveform that is employed to transmit a sequence of in-band words from deviating by more than a predetermined threshold value from a reference potential. Specifically, the tenth bit of each “in-band” word indicates whether eight of the other nine bits thereof have been inverted during the encoding process to correct for an imbalance between running counts of ones and zeroes in the stream of previously encoded data bits);        2. the encoded video data and a video clock signal are transmitted as differential signals (the video clock and encoded video data are transmitted as differential signals over conductor pairs);        3. three conductor pairs are employed to transmit the encoded video, and a fourth conductor pair is employed to transmit the video clock signal; and        4. signal transmission occurs in one direction, from a transmitter (typically associated with a desktop or portable computer, or other host) to a receiver (typically an element of a monitor or other display device).        
A use of the TMDS serial link is the “Digital Visual Interface” interface (“DVI” link) adopted by the Digital Display Working Group. A DVI link can be implemented to include two TMDS links (which share a common conductor pair for transmitting a video clock signal) or one TMDS link, as well as additional control lines between the transmitter and receiver. A DVI link includes a transmitter, a receiver, and the following conductors between the transmitter and receiver: four conductor pairs (Channel 0, Channel 1, and Channel 2 for video data, and Channel C for a video clock signal), Display Data Channel (“DDC”) lines for bidirectional communication between the transmitter and a monitor associated with the receiver in accordance with the conventional Display Data Channel standard (the Video Electronics Standard Association's “Display Data Channel Standard,” Version 2, Rev. 0, dated Apr. 9, 1996), a Hot Plug Detect (HPD) line (on which the monitor transmits a signal that enables a processor associated with the transmitter to identify the monitor's presence), Analog lines (for transmitting analog video to the receiver), and Power lines (for providing DC power to the receiver and a monitor associated with the receiver). The Display Data Channel standard specifies a protocol for bidirectional communication between a transmitter and a monitor associated with a receiver, including transmission by the monitor of an Extended Display Identification (“EDID”) message that specifies various characteristics of the monitor, and transmission by the transmitter of control signals for the monitor.
Another serial link is the “High Definition Multimedia Interface” interface (sometimes referred to as an “HDMI” link or interface) developed Silicon Image, Inc., Matsushita Electric, Royal Philips Electronics, Sony Corporation, Thomson Multimedia, Toshiba Corporation, and Hitachi.
It has been proposed to use the cryptographic protocol known as the “High-bandwidth Digital Content Protection” (“HDCP”) protocol to encrypt digital video to be transmitted over a DVI or HDMI link and to decrypt the data at the DVI (or HDMI) receiver. The HDCP protocol is described in the document “High-bandwidth Digital Content Protection System,” Revision 1.0, dated Feb. 17, 2000, by Intel Corporation, and the document “High-bandwidth Digital Content Protection System Revision 1.0 Erratum,” dated Mar. 19, 2001, by Intel Corporation. The full text of both of these documents is incorporated herein by reference.
A DVI-compliant (or HDMI-compliant) transmitter implementing the HDCP protocol asserts a stream of pseudo-randomly generated 24-bit words, known as cout[23:0], during each active period (i.e. when DE is high). In a DVI-compliant system, each active period is an active video period. In an HDMI-compliant system, each active period is a period in which video, audio, or other data are transmitted. Each 24-bit word of the cout data is “Exclusive Ored” (in logic circuitry in the transmitter) with a 24-bit word of RGB video data input to the transmitter, in order to encrypt the video data. The encrypted data are then encoded (according to the TMDS standard) for transmission. The same sequence of cout words is also generated in the receiver. After the encoded and encrypted data received at the receiver undergo TMDS decoding, the cout data are processed together with the decoded video in logic circuitry in order to decrypt the decoded data and recover the original input video data.
Before the transmitter begins to transmit HDCP encrypted, encoded video data, the transmitter and receiver communicate bidirectionally with each other to execute an authentication protocol (to verify that the receiver is authorized to receive protected content, and to establish shared secret values for use in encryption of input data and decryption of transmitted encrypted data). More specifically, each of the transmitter and the receiver is preprogrammed (e.g., at the factory) with a 40-bit word known as a key selection vector, and an array of forty 56-bit private keys. To initiate the first part of an authentication exchange between the transmitter and receiver, the transmitter asserts its key selection vector (known as “AKSV”), and a pseudo-randomly generated session value (“An”) to the receiver. In response, the receiver sends its key selection vector (known as “BKSV”) and a repeater bit (indicating whether the receiver is a repeater) to the transmitter, and the receiver also implements a predetermined algorithm using “AKSV” and the receiver's array of forty private keys to calculate a secret value (“Km”). In response to the value “BKSV” from the receiver, the transmitter implements the same algorithm using the value “BKSV” and the transmitter's array of forty private keys to calculate the same secret value (“Km”) as does the receiver.
Each of the transmitter and the receiver then uses the shared value “Km,” the session value “An,” and the repeater bit to calculate a shared secret value (the session key “Ks”), a value (“R0”) for use in determining whether the authentication is successful, and a value (“M0”) for use during a second part of the authentication exchange. The second part of the authentication exchange is performed only if the repeater bit indicates that the receiver is a repeater, to determine whether the status of one or more downstream devices coupled to the repeater requires revocation of the receiver's authentication.
After the first part of the authentication exchange, and (if the second part of the authentication exchange is performed) if the receiver's key selection vector is not revoked as a result of the second part of the authentication exchange, each of the transmitter and the receiver generates a 56-bit frame key Ki (for initiating the encryption or decrypting a frame of video data), an initialization value Mi, and a value Ri used for link integrity verification. The Ki, Mi, and Ri values are generated in response to a control signal (identified as “ctl3” in FIG. 1), which is received at the appropriate circuitry in the transmitter, and is also sent by the transmitter to the receiver, during each vertical blanking period, when DE is low. As shown in the timing diagram of FIG. 1, the control signal “ctl3” is a single high-going pulse. In response to the Ki, Mi, and Ri values, each of the transmitter and receiver generates a sequence of pseudo-randomly generated 24-bit words cout[23:0]. Each 24-bit word of the cout data generated by the transmitter is “Exclusive Ored” (in logic circuitry in the transmitter) with a 24-bit word of a frame of video data (to encrypt the video data). Each 24-bit word of the cout data generated by the receiver is “Exclusive Ored” (in logic circuitry in the receiver) with a 24-bit word of the first received frame of encrypted video data (to decrypt this encrypted video data). The 24-bit words cout[23:0] generated by the transmitter are content encryption keys (for encrypting a line of input video data), and the 24-bit words cout[23:0] generated by the receiver are content decryption keys (for decrypting a received and decoded line of encrypted video data).
During each horizontal blanking interval (in response to each falling edge of the data enable signal DE) following assertion of the control signal ctl3, the transmitter performs a rekeying operation and the receiver performs the same rekeying operation to change (in a predetermined manner) the cout data words to be asserted during the next active video period. This continues until the next vertical blanking period, when the control signal ctl3 is again asserted to cause each of the transmitter and the receiver to calculate a new set of Ki and Mi values (with the index “i” being incremented in response to each assertion of the control signal ctl3). The Ri value is updated once every 128 frames. Actual encryption of input video data or decryption of received, decoded video data (or encryption of input video, audio, or other data, or decryption of received, decoded video, audio, or other data, in the case of an HDMI-compliant system) is performed, using the cout data words generated in response to the latest set of Ks, Ki and Mi values, only when DE is high (not during vertical or horizontal blanking intervals).
Each of the transmitter and receiver includes an HDCP cipher circuit (sometimes referred to herein as an “HDCP cipher”) of the type shown in FIG. 2. The HDCP cipher includes linear feedback shift register (LFSR) module 80, block module 81 coupled to the output of LFSR module 80, and output module 82 coupled to an output of block module 81. LFSR module 80 is employed to re-key block module 81 in response to each assertion of an enable signal (the signal “ReKey” shown in FIG. 2), using the session key (Ks) and the current frame key (Ki). Block module 81 generates (and provides to module 80) the key Ks at the start of a session and generates (and applies to module 80) a new value of key Ki at the start of each frame of video data (in response to a rising edge of the control signal “ctl3,” which occurs in the first vertical blanking interval of a frame). The signal “ReKey” is asserted to the FIG. 2 circuit at each falling edge of the DE signal (i.e., at the start of each vertical and each horizontal blanking interval), and at the end of a brief initialization period (during which module 81 generates an updated value of the frame key Ki) after each rising edge of signal “ctl3.”
Module 80 consists of four linear feedback shift registers (having different lengths) and combining circuitry coupled to the shift registers and configured to assert a single output bit per clock interval to block module 81 during each of a fixed number of clock cycles (e.g., 56 cycles) commencing on each assertion of the signal “ReKey” when DE is low (i.e., in the horizontal blanking interval of each line of video data). This output bit stream is employed by block module 81 to re-key itself just prior to the start of transmission or reception of each line of video data.
Block module 81 comprises two halves, “Round Function K” and “Round Function B,” as shown in FIG. 3. Round Function K includes 28-bit registers Kx, Ky, and Kz, seven S-Boxes (each a 4 input bit by 4 output bit S-Box including a look-up table) collectively labeled “S-Box K” in FIG. 3, and linear transformation unit K, connected as shown. Round Function B includes 28-bit registers Bx, By, and Bz, seven S-Boxes (each a 4 input bit by 4 output bit S-Box including a look-up table) collectively labeled “S-Box B” in FIG. 3, and linear transformation unit B, connected as shown. Round Function K and Round Function B are similar in design, but Round Function K performs one round of a block cipher per clock cycle to assert a different pair of 28-bit round keys (Ky and Kz) each clock cycle in response to the output of LFSR module 80, and Round Function B performs one round of a block cipher per clock cycle, in response to each 28-bit round key Ky from Round Function K and the output of LFSR module 80, to assert a different pair of 28-bit round keys (By and Bz) each clock cycle. The transmitter generates value An at the start of the authentication protocol and the receiver responds to it during the authentication procedure. The value An is used to randomize the session key. Block module 81 operates in response to the authentication value (An), and the initialization value (Mi) which is updated by output module 82 at the start of each frame (at each rising edge of the control signal “ctl3”).
Each of linear transformation units K and B outputs 56 bits per clock cycle. These output bits are the combined outputs of eight diffusion networks in each transformation unit. Each diffusion network of linear transformation unit K produces seven output bits in response to seven of the current output bits of registers Ky and Kz. Each of four of the diffusion networks of linear transformation unit B produces seven output bits in response to seven of the current output bits of registers By, Bz, and Ky, and each of the four other diffusion networks of linear transformation unit B produces seven output bits in response to seven of the current output bits of registers By and Bz.
In Round Function K, one bit of register Ky takes its input from the bit stream asserted by module 80 when the ReKey signal is asserted. In Round Function B, one bit of register By takes its input from the bit stream asserted by module 80 when the ReKey signal is asserted.
Output module 82 performs a compression operation on the 28-bit keys (By, Bz, Ky and Kz) asserted to it (a total of 112 bits) by module 81 during each clock cycle, to generate one 24-bit block of pseudo-random bits cout[23:0] per clock cycle. Each of the 24 output bits of module 82 consists of the exclusive OR (“XOR”) of nine terms as follows: (B0*K0)+(B1*K1)+(B2*K2)+(B3*K3)+(B4*K4)+(B5*K5)+(B6*K6)+(B7)+(K7), where “*” denotes a logical AND operation and “+” denotes a logical XOR operation.
In the transmitter, logic circuitry 83 (shown in FIG. 2) receives each 24-bit word of cout data and each input 24-bit RGB video data word, and performs a bitwise XOR operation thereon in order to encrypt the video data, thereby generating a word of the “data_encrypted” data indicated in FIG. 2. Typically, the encrypted data subsequently undergoes TMDS encoding before it is transmitted to a receiver. In the receiver, logic circuitry 83 (shown in FIG. 2) receives each 24-bit block of cout data and each recovered 24-bit RGB video data word (after the recovered data has undergone TMDS decoding), and performs a bitwise XOR operation thereon in order to decrypt the recovered video data.
Throughout the specification the expression “TMDS-like link” will be used to denote a serial link capable of transmitting encoded data (e.g., encoded digital video data), and optionally also a clock for the encoded data, from a transmitter to a receiver, and optionally also capable of transmitting (bidirectionally or unidirectionally) one or more additional signals (e.g., encoded digital audio data or other encoded data) between the transmitter and receiver, that is or includes either a TMDS link or a link having some but not all of the characteristics of a TMDS link. Examples of TMDS-like links include links that differ from TMDS links only by encoding data as N-bit code words (where N 10, and thus are not 10-bit TMDS code words) and links that differ from TMDS links only by transmitting encoded video over more than three or less than three conductor pairs. Some TMDS-like links encode input video data (and other data) to be transmitted into encoded words comprising more bits than the incoming data using a coding algorithm other than the specific algorithm used in a TMDS link, and transmit the encoded video data as in-band characters and the other encoded data as out-of-band characters (HDMI-compliant systems encode audio data for transmission according to an encoding scheme that differs from the encoding scheme employed for video data). The characters need not be classified as in-band or out-of-band characters based according to whether they satisfy transition minimization and DC balance criteria. Rather, other classification criteria could be used. An example of an encoding algorithm, other than that used in a TMDS link but which could be used in a TMDS-like link, is IBM 8b10b coding. The classification (between in-band and out-of-band characters) need not be based on just a high or low number of transitions. For example, the number of transitions of each of the in-band and out-of-band characters could (in some embodiments) be in a single range (e.g., a middle range defined by a minimum and a maximum number of transitions).
The term “transmitter” is used herein in a broad sense to denote any unit capable of transmitting data over a link and optionally also encoding and/or encrypting the data to be transmitted. The term “receiver” is used herein in a broad sense to denote any unit capable of receiving data that has been transmitted over a link (and optionally also decoding and/or decrypting the received data). Unless otherwise specified, a link can but need not be a TMDS-like link or other serial link. The term transmitter can denote a transceiver that performs the functions of a receiver as well as the functions of a transmitter.
The expression “content key” herein denotes data that can be used by a cryptographic device to encrypt content (e.g., video, audio, or other content), or to denote data that can be used by a cryptographic device to decrypt encrypted content.
The term “key” is used herein to denote a content key, or data that can be used by a cryptographic device to generate or otherwise obtain (in accordance with a content protection protocol) a content key. The expressions “key” and “key data” are used interchangeably herein.
The term “stream” of data as used herein denotes that all the data are of the same type and are transmitted from a source to a destination device. All or some of the data of a “stream” of data together may constitute a single logical entity (e.g., a movie or song, or portion thereof).
The term “channel,” as used herein, refers to that portion of a link that is employed to transmit data (e.g., a particular conductor or conductor pair between the transmitter and receiver over which the data are transmitted, and specific circuitry within the transmitter and/or receiver used for transmitting and/or recovery of the data) and to the technique employed to transmit the data over the link.
The term “HDCP protocol” is used herein in a broad sense to denote both the conventional HDCP protocol and modified HDCP protocols that closely resemble the conventional HDCP protocol but differ therefrom in one or more respects. Some but not all embodiments of the invention implement an HDCP protocol. The conventional HDCP protocol encrypts (or decrypts) data during active video periods but not during blanking intervals between active video periods. An example of a modified HDCP protocol is a content protection protocol that differs from the conventional HDCP protocol only to the extent needed to accomplish decryption of data transmitted between active video periods (as well as decryption of video data transmitted during active video periods) or to accomplish encryption of data to be transmitted between active video periods (as well as encryption of video data to be transmitted during active video periods).
A example of an HDCP protocol that is a modified version of the conventional HDCP protocol is an “upstream” variation on the conventional HDCP protocol (to be referred to as an “upstream” protocol). A version of the upstream protocol is described in the Upstream Link for High-bandwidth Digital Content Protection, Revision 1.00, by Intel Corporation, Jan. 26, 2001 (referred to hereinafter as the “Upstream Specification”). In the upstream protocol, the “transmitter” is a processor programmed with software for implementing the upstream protocol to communicate with a graphics controller (with the graphics controller functioning as a “receiver”). Such a processor can send video data to the graphics controller after executing an authentication exchange in accordance with the “upstream” protocol. The processor and graphics controller can be elements of a personal computer configured to send encrypted video data from the graphics controller to a display device. The graphics controller and display device can be configured to execute another encryption protocol (e.g., the above-mentioned conventional HDCP protocol, which can be referred to in this context as the “downstream” HDCP protocol) to allow the graphics controller (this time functioning as a “transmitter”) to encrypt video data and send the encrypted video to the display device, and to allow the display device (functioning as a “receiver”) to decrypt the encrypted video.
However, in contrast to the present invention, the upstream protocol would not provide adequate protection to raw content that is present in a processor of a personal computer (or other open system), where the processor is programmed with software for implementing the upstream protocol (with the processor functioning as a “transmitter”) to communicate with (and send the raw content to) a graphics controller functioning as a “receiver,” to allow the graphics controller (this time functioning as a “transmitter”) to encrypt the raw content and transmit the resulting encrypted content (in accordance with the “downstream” HDCP protocol) to a device (e.g., a display device) external to the open system.
There are a number of structural flaws in the upstream protocol, and a personal computer (or other open system) that implements the upstream protocol would be subject to at least one attack in which the attacker could access the raw content present within the open system. An example of such an attack is a “man-in-the-middle” attack, in which the upstream authentication requests (from the graphics controller) are intercepted and the corresponding responses (to the graphics controller) are forged. A personal computer that implements the upstream protocol is easily attacked for one fundamental reason: at least two of the system elements (the application and the video driver) are in software. They can be debugged, de-compiled, altered, and copied, with any resulting “hack” potentially distributed quickly and easily across the Internet.
Thus, the upstream protocol is fundamentally flawed and will allow people of ordinary skills (and with no special hardware or tools) to bypass the intended HDCP protections. Furthermore, this can happen on a large scale, and can not readily be detected or counteracted.
The following is a list of the flaws of the upstream protocol (as implemented in a personal computer) that permit attacks of the type mentioned above:                key length is too short. Upstream authentication uses a key length of 56 bits, which is too short to be considered secure;        drivers can be hacked. If a driver is compromised, then the software application must assume that all of its communication is compromised;        there is no connection between downstream and upstream authentication. The software application can not validate that a proper “downstream” authentication has occurred;        there is no upstream revocation. No revocation mechanism is defined for the upstream link; and        the software and driver are not secure. The application software and driver can be studied, altered, copied, distributed, and otherwise assaulted.        