The most common use of DVD technology is to record and play so-called “DVD-Video”, which is a particular data format used to represent movies and other audio/video content. However, newer DVD formats such as “DVD-Audio”, DVD-VR, rewritable DVD-Video discs, focus on additional functions. The “DVD-Audio” format records audio content at a higher fidelity than the “DVD-Video” format, although both formats can include both audio and video elements.
Multimedia content such as that found on DVDs can be played back or rendered using a variety of hardware. Such hardware is frequently controlled by software, which coordinates the various functions needed to turn digital data into perceptible audio and visual content.
Although dedicated-function devices are often used for rendering DVD and other multimedia content, personal computers are also being used as multimedia presentation devices. In practice, the internal designs of various types of playback devices may be similar, whether they are dedicated-function devices or more multi-function devices such as personal computers.
FIG. 1 shows relevant components that might be used in a playback device such as a personal computer 100. The illustrated components comprise a mixture of hardware and software. The depicted architecture is similar to that used in the Microsoft Windows® family of operating systems, and in particular in the DirectX® technology used within the Windows® systems. The same or similar technology might be used in a variety of devices, including seemingly dedicated-function devices such as typical stereo-system components.
The components include a DVD disc drive 105 that receives a DVD disc 107. The DVD disc 107 includes audio and video content which in some cases is at least partially encrypted. Collectively DVD disc drive 105 and the received DVD disc 107 are considered a source and will be referred to below as a multimedia source, audio/video source, or simply as source 105. Although the example depicts a DVD source, in other embodiments the multimedia content might be received from some other source such as a network source. An Internet website is an example of such a source.
In this architecture, direct interface with DVD disc drive 105 is accomplished by means of a navigation interface and component 108. Navigation interface and component 108 is responsible for reading the appropriate content from the DVD and for passing it on to other components, to be described below, that decode or transcode, and render the content.
An application or application program 110 is responsible for interacting with a user and for translating user commands into instructions for navigation interface and component 108. Application 110 might, for example, be a media player program implemented in software. Microsoft® Corporation's Windows® Media Player is an example of such a media player program. Application 110 can select different video modes, video angles, subtitle languages, menu languages, playback rates and directions, etc. Navigation interface and component 108 uses this information to select the appropriate video and audio streams.
The playback components include a video processing stack 112 and an audio processing stack 114. Video and audio content retrieved by navigation interface and component 108 are passed respectively to video processing stack 112 and audio processing stack 114 for conversion into signals to drive a video presentation device such as a video monitor and a speaker or other audio transducer (not shown). A processing stack in general comprises one or more processing components arranged linearly so that content is received by a topmost processing component, passed downward through successive components, until is it finally received at a bottommost component. In this example, the content is streamed, meaning that it is flowing continuously into the stack and downward through its components. As the content passes through each component, that processes the content before passing it on to the next component.
In this example, video processing stack 112 has three components: a video decoder 116, a video interface 124, and a video driver/card 126. Audio processing stack 114 has analogous components: an audio decoder 120, an audio interface 130, and an audio driver/card 132. Note that this is merely a simple example of video and audio processing components that may form video and audio processing stacks. In practice, a number of processing components might be included, either in addition to those shown or in some cases in place of those shown. Note also that any individual component might be embodied as hardware or software, although hardware components typically operate in conjunction with a software component that acts as a proxy for the hardware and that provides communications between the hardware and other components. This allows for content to be processed within the context of an Internet based distributed operating system.
A key is code or a phrase that allows locking or unlocking of operational aspects of the protection algorithm. Public key cryptography systems may be known as asymmetric-key systems. An advantage of public key cryptography systems is that public keys are widely distributable and can be important for such actions as authentication of digital signatures. The disadvantage is that public key distribution is slow, because everyone must have access to a key generation mechanism in order for the key to be fully accessible to the public at large.
Public key cryptography has low infrastructural overhead because it has no centralized infrastructure for trusted-key management. Instead, users validate each others' public keys rigorously and manage their own private keys securely. This is difficult to do well, and causes the system to be only as secure as its users. Such a rule of operation is considered to be a compliance defect in a cryptosystem, because the rule is both difficult to follow and unenforceable.
The defects of public-key cryptography make it more suitable for server-to-server security than for desktop applications. Public-key cryptography is uniquely well-suited to certain parts of a secure global network. It is widely accepted that public key security systems are easier to administer, more secure, less trustful, and have better geographical reach than private or symmetric-key security systems. However, even in server environments, public-key cryptography relies too heavily on the security discipline of end users. Some public key systems are RSA, Diffie-Hellman, and ElGamal
Private key cryptography systems are also known as symmetric-key systems. The advantages of private key systems are that they are fast and secure. The disadvantage is that the private key must be distributed in advance and must not be divulged, so the system is based on a “kept secret” and is compromised if the key is disclosed. Systems that use private keys have more stringent security requirements to protect private keys against detection, tampering, or outright theft. For example, suppose a financial institution issues a private key to a customer to access his banking records. If the private key is broken once for one transaction, all banking records for that customer are compromised.
Some types of private key systems include DES, RC4, RC5, IDEA, and SkipJack. The Data Encryption Standard (DES) is widely published and used federal standard for private-key systems. The basic DES is a 56-bit key that can be cracked in about a day with specialized hardware. The algorithm called “triple DES” is a 112-bit key that currently cannot be cracked by known techniques.
In the examples described herein, the various depicted system components operate independently as objects, passing content to and from each other with software interfaces as are commonly used in the Windows® programming environment and other object-oriented programming environments. The various arrows shown in FIG. 1 represent content flow through such interfaces. As shown, navigation interface and component 108 interacts with DVD disc drive 105 to retrieve content from a DVD disc 107. Application 110 interacts with navigation interface and component 108 to select various playback parameters. Navigation component 108 provides video and audio content streams to video stack 112 and audio stack 114, respectively.
The first component in each of the stacks is a decoder: video decoder 116 in video stack 112 and audio decoder 120 in audio stack 114. The decoders are used to decompress and decrypt DVD content. DVD-Video typically uses a content protection scheme known as the content-scrambling system (CSS). CSS and other content protection schemes make use of encryption and cryptographic key exchange between encrypted DVD disc sectors and decrypting components. DVD-Audio uses a scheme known as content protection for prerecorded media (CPPM). In these schemes, the navigation interface and component 108 acts as an intermediary to transfer encryption keys and content between the DVD source and the appropriate decoder. When needed, a decoder (e.g., video decoder 116 and audio decoder 12) uses the decryption key to decrypt the content before decompression. Separate, secure logical communications channels are used for the video and audio streams.
After decoding and decryption, the video and content streams are passed to subsequent processing components of the respective processing stacks. In the case of video, it is passed in this example to video interface 124 and then to video driver/card 126. Audio is passed to audio interface 130 and audio driver/card 132.
Content protection schemes such as CSS, CPPM, and content protection for recorded media (CPRM) make use of a three step process: establishing an encrypted secure logical side-band bus (i.e. through a separate channel from the actual video/audio content flow channel), an authentication process over the bus that involves a key exchange between the source such as the DVD disc drive 105 and a decrypting component. The logical bus is established by negotiating a common session key over possibly publicly-visible communication channels. A third process, referred to as decrypting, involves another key transfer over the secure logical bus to be used to decrypt the encrypted video or audio content. Collectively, establishing the logical bus, the authentication and passing of the decryption key are referred to as “key exchange”. Navigation interface and component 108 receives and sends keys from the DVD source for the video stack 112 through secure logical busses 133 and 135 and audio stack 114 through secure logical busses 140 and 145. Decryption keys for video stack 112 are sent from DVD disc drive 105 to the video decoder 116 through the navigation interface and component 108 through secure logical bus 133 and 135. A key exchange is performed with the audio decoder 120 through secure logical bus 140 and 145.
Although not shown, DVD video may include “primary” video content and “sub-picture” video content. Primary video content may include things like movie scenes, while sub-picture video content is overlaid on top of the primary video and may include menus/menu-highlights and subtitles or graphics that can be optionally overlaid on the movie scenes. An additional decoder is typically provided in the video stack for sub-picture video content, and a video mixing renderer component may also be included in the video stack to perform the appropriate overlaying in response to control by application program 110.
The architecture shown in FIG. 1 and described above endeavors to provide copy protection for DVD content. However, it presents at least one weakness in this regard. In particular, audio and possibly video content is passed from the decoder to numerous subsequent processing components in the stack in an unencrypted state. This makes it possible for a hacker to tap into the content flow between components, and thereby obtaining a decrypted version of the audio content.
Typically for video, a video decoder such as 116 may add some form of private encryption to video hardware. Unfortunately, a custom encryption technique is used with each video card manufacturer. Each video card manufacturer must also support multiple decoder vendors' custom encryption techniques. Not only is this a costly infrastructure to support (e.g. vendors must coordinate and test), but it prevents new vendors from working with other vendor's components.
In the case of DVD-Audio formatted content, it is assumed that the audio is of even higher value since it is of higher fidelity then audio on a DVD-video disc. Thus, protecting it from unauthorized copying of its uncompressed form is of paramount importance.
The DVD-audio encryption mechanism is also used for DVD-video content on rewritable or write-once media. Since each piece of writable media has a unique identifier, the identifier can be used to generate an encryption key to “tie” written content to the media.
To overcome the vulnerability of DVD-Audio-formatted content to unauthorized copying, manufacturers have relied on so-called monolithic drivers rather than the stack architecture described above. All processing, including decryption, is performed within a single component, making it difficult for a hacker to tap into a decrypted content flow. For writable DVD-video similar monolithic stacks have been used.
However, this solution for DVD-audio has several drawbacks. Although a monolithic stack provides audio playback, it does not allow video playback support. In other words, commands cannot be sent back up the monolithic stack to control video playback. For example, there is a provision for DVD-audio content to control a wipe (e.g., dissolve or fade command) for a video content. With a specialized audio player (i.e., monolithic stack), a navigator (e.g. navigation interface and component 108) does not issue feedback to control video wipes.
Since the DVD-audio monolithic stack exclusively controls playing of audio content, other PC applications such as a media player program cannot control or make use of the audio content.
Because the monolithic stack depends on proprietary protocols unique to the monolithic stack, components in the monolithic stack cannot be easily exchanged or replaced. In other words, the choice of components in the monolithic stack is limited or not allowed. This limits the options available to a PC manufacturer as to components, whether in software and/or hardware, in an audio stack. Typically, the only option to a PC manufacturer may be the monolithic stack or audio player.
Similar componentization deficiencies exist with DVD-video approaches for writable media.