A user often uses a terminal user to access protected content in order to play it. Accessing protected multimedia content generally includes loading it into memory, lifting the protection thereof, either on the fly when it is received or on a storage medium on which it has previously been stored, decoding it, and finally, transmitting it to a multimedia appliance capable of playing it, storing it, or making any other use thereof offered by the service supplying protected multimedia contents.
Here, “lifting the protection on the fly” describes the fact that fragments of the multimedia content are processed as and when they are received without waiting for all fragments of the multimedia content to be completely and entirely received.
A typical terminal comprises a descrambler, a decoder, and a memory shared by the descrambler and the decoder. Typical content that can be supplied includes audiovisual content, for example television programs, audio only content, for example a radio program, or, more generally, any digital content containing video and/or audio, such as a computer application, a game, a slideshow, a picture or any set of data.
Among this content is temporal content. Temporal multimedia content is multimedia content played by a succession in time of sounds, in the case of an audio temporal content, or of pictures, in the case of a video temporal content, or of sounds and pictures temporally synchronized with one another in the case of an audiovisual temporal multimedia content. Temporal multimedia content can also comprise interactive temporal components temporally synchronized with the sounds or the pictures.
To be supplied, such content is first of all coded, or compressed. This allows transmission to require less bandwidth.
The video component of the content is coded according to a video format, such as MPEG-2. A complete presentation of this format can be found in a document published by the International Organization for Standardization under the reference ISO/IEC 13818-2:2013 and the title “Information Technology—Generic coding of moving pictures and associated audio information—Part 2: Video data”. Many other formats, such as MPEG-4 ASP, MPEG-4 Part 2, MPEG-4 AVC (or Part 10), HEVC (High Efficiency Video Coding), or WMV (Windows Media Video) can alternatively be used, and rely on the same principles. Thus, all of the following applies equally to these other video formats which rely on the same principle as the MPEG-2 coding.
MPEG-2 coding involves general data compression procedures.
For fixed pictures, MPEG-2 exploits the spatial redundancy internal to a picture, the correlation between the neighboring points and the lesser sensitivity of the eye to details.
For moving pictures, MPEG-2 exploits the strong temporal redundancy between successive pictures. The exploitation thereof makes it possible to code certain pictures of the content, here said to be “deduced pictures,” by reference to others, here said to be “sourced pictures.” The process of decoding the deduced pictures can be carried out by prediction or interpolation. This means that decoding the deduced pictures is possible only after decoding the source pictures.
Other features, referred to herein as “initial pictures,” are coded without reference to such source pictures. These pictures contain, when they are coded, all of the information necessary to their decoding. Therefore, they can be completely decoded independently of the other pictures. These initial pictures are thus the obligatory point of entry when accessing the content. The resulting coded content does not therefore comprise the data necessary to the decoding of each of the pictures independently of the others, but it is made up of “sequences” according to the MPEG-2 terminology. One sequence implements the compression of at least one “group of pictures” (or GOP in MPEG-2).
A group of pictures is a series of consecutive pictures in which each picture is either an initial or source picture for at least one deduced picture contained in the same series of consecutive pictures, or a deduced picture such that each of the source pictures necessary to its decoding belongs to the same series of consecutive pictures.
A group of pictures does not contain any smaller series of consecutive pictures having the same properties as above. The group of pictures is thus the smallest part of content that can be accessed without having to previously decode another part of this content.
A sequence is delimited by a “header” and an “end,” each identified by a first specific code. The header comprises parameters that characterize properties expected of the decoded pictures, such as the horizontal and vertical sizes, ratio, frequency. The standard recommends repeating the header between the groups of pictures of the sequence in such a way that the successive occurrences thereof are spaced apart by approximately a few seconds in the coded content.
For example, a group of pictures more commonly than not, comprises more than 5 to 10 pictures and, generally, less than 12 or 20 or 50 pictures. For example, in a system that displays 25 pictures per second, a group of pictures typically represents a playing time greater than 0.1 or 0.4 seconds and, generally, less than 0.5 or 1 or 10 seconds.
Temporal multimedia content can comprise several video components. In this case, each of these components is coded as described above.
The audio component of the content is moreover coded according to an audio format such as MPEG-2 audio. A complete presentation of this format in a document published by the International Organization for Standardization under the reference ISO/IEC 13818-3:1998 and the title “Information Technology—Generic coding of moving pictures and associated audio information—Part 3: Audio”. Many other formats, such as MPEG-1 Layer III, better known by the name MP3, AAC (Advanced Audio Coding), Vorbis or WMA (Windows Media Audio), can alternatively be used, and rely on the same principles. Thus, all of the following applies equally to these other audio formats which rely on the same principles as the MPEG-2 audio coding.
MPEG-2 audio coding obeys the same principles described above for that of temporal video content. The resulting coded content therefore, likewise, consists of “frames.” A frame is the analog, in audio, of a group of pictures in video. The frame is therefore the smallest part of audio content that can be accessed without having to decode another part of this audio content. The frame further contains all the information useful to the decoding thereof.
A frame typically comprises more than 100 or 200 samples each coding a sound and, generally, less than 2000 or 5000 samples. Typically, when it is played by a multimedia appliance, a frame lasts more than 10 ms or 20 ms and, generally, less than 80 ms or 100 ms. In some examples, a frame comprises 384 or 1152 samples, each coding a sound. Depending on the signal sampling frequency this frame represents a playing time of 8 to 12, or 24 to 36 milliseconds.
Temporal multimedia content can comprise several audio components. In this case, each of these components is coded as described above.
The coded components of content, also qualified as elementary data streams, are then multiplexed, that is to say, notably, temporally synchronized, then combined into a single stream, or flow, of data.
Such content, notably when the subject of rights such as copyrights or similar rights, is supplied in protected form by a multimedia content protection system. This system makes it possible to ensure the observance of conditions for access to the content that devolve from these rights.
The content is typically supplied in encrypted form, for its protection, by a digital rights management (“DRM”) system. This encryption is generally performed by an encryption key and a symmetrical algorithm. It is applied to the stream resulting from the multiplexing or, before multiplexing, to the components of the coded content.
A DRM system is in fact a multimedia content protection system. The terminology of the field of digital rights management systems is thus used hereinafter in this document. A more comprehensive description thereof is found in the document: DRM Architecture, Draft version 2.0, OMA-DRM-ARCH-V2_0-20040518-D, Open Mobile Alliance, 18 May 2004.
In such a digital rights management system, obtaining a license enables a terminal to access a protected multimedia content. Such a license comprises a content key. This content key is necessary to decrypt the multimedia content that has been protected by the symmetrical encryption algorithm.
The content key is generally inserted into the license in the form of a cryptogram obtained by the encryption of the content key with an encryption key, which is called a “terminal” key, that is specific to the terminal or known thereto.
To access the content, the terminal extracts the content key from the license by decrypting its cryptogram using its terminal key.
Next, the terminal's descrambler descrambles, or decrypts, the content by means of the content key duly extracted from the license, thus lifting the protection. The descrambler thus generates unscrambled multimedia content comprising at least one temporal series of video sequences or of groups of pictures, or of audio frames. This multimedia content can be played by a multimedia appliance connected to the terminal.
As used herein, “unscrambled” describes the fact that the multimedia content no longer needs to be decrypted to be played, by a multimedia appliance in a way that is directly perceptible and intelligible to a human being. The term “multimedia appliance” also denotes any device capable of playing the unscrambled multimedia content, for example, a television or a multimedia player.
Next, the descrambler next transfers the unscrambled content into the shared memory.
Next, the decoder of the terminal reads the unscrambled content in the shared memory, and decodes it.
Finally, the terminal transmits the duly decoded content to a multimedia appliance.
More specifically, in the case of temporal multimedia content, the reception, the processing, then the transmission to a multimedia appliance, by the terminal, of the content, as described above, are done in fragments.
A fragment is a restricted part of the unscrambled multimedia stream that has a shorter playing time than that of the entire multimedia stream. A fragment therefore comprises a restricted part of each video and audio component of the unscrambled multimedia stream that has a shorter playing time than that of the entire multimedia stream. These restricted parts of components are synchronized in the stream to be played simultaneously. A fragment therefore comprises the restricted part of the temporal series of video sequences or of groups of pictures or of audio frames implementing the coding of this restricted part of component of the unscrambled multimedia stream. This restricted part consists, typically, of a plurality of video sequences or of groups of pictures, or of successive audio frames.
In the method for accessing the content by the terminal as described above, the descrambled content is transferred into the shared memory of the terminal by its descrambler to then be read therein by its decoder. The unscrambled content is therefore present in this memory, at least between the instant when it is deposited therein by the descrambler and when it is read by the decoder. It is then easy, in such an open environment, to read the descrambled content therein and to thus obtain illegal access to the content. Additional protection of the content during its presence in the shared memory is therefore necessary.
Known cryptographic solutions to this problem rely on the sharing of keys by the descrambler and the decoder. In these solutions, the descrambler encrypts the descrambled content, before transferring it into the shared memory. Then, the decoder reads the duly encrypted content in the shared memory, decrypts it and, before it is possibly stored in a memory of the terminal, decodes it and transmits it to a multimedia appliance capable of playing it. These cryptographic solutions are safe but have the drawback of being complex and of requiring significant computing resources to successfully conduct the cryptographic computations involved.
Other solutions are known which share this drawback, consisting for example in the translation of the content into other content formats.
Solutions consisting in implementing security mechanisms such as the verification, by the DRM module of the terminal, of the destination application of the unscrambled content, are also known. These solutions present the drawback whereby this implementation is unsafe in open environments such as terminals.