Most conventional digital rights management (DRM) schemes (content delivery and link protection protocols) use the same key and nonce for both the audio and video substreams of a premium content streams and typically do not provide any means for telling audio and video streams apart. Some examples of such DRM schemes include Google Widevine, Microsoft's Playread, and High-bandwidth Digital Content Protection (HDCP).
Many devices, including mobile phones, set top boxes, and other devices configured to handle DRM protected content, include a trusted content protection module and implement a protected video processing path that is intended to prevent the DRM protections on the video content from being subverted. However, similar protections are typically not provided on the audio processing paths. The audio content may be unencrypted and released to the high level operating system (HLOS) of the device without verification. An attacker can exploit this weakness to obtain unencrypted video content, because the audio processing path does not actually determine whether the content being unencrypted is actually audio content. An attacker could circumvent the protection on video content by instructing the trusted content protection module of the device that desired video content is audio content or by interleaving video content with audio content. The video content will then be unencrypted and provided to the HLOS, thereby circumventing the DRM protections on the video content and allowing the unrestricted access to the unencrypted video content. An attacker could potentially obtain the entirety the video content of DRM protected video content by interleaving portions of the video content with audio content to obtain the encrypted content and reassembling the unencrypted video content. This approach may require the attacker to interleave portions of the video content with audio content multiple times to obtain the entire unencrypted video content, but once the entire video content has been obtained, the attacker could freely distribute the content without any DRM protections.
Conventional solutions that can be used with MPEG-1 or MPEG-2 Layer III (MP3) or Advanced Audio Coding (AAC) content include limiting the overall bandwidth of content streams and detecting the frame starts within the content stream. Frames have maximum length, and if a frame header does not occur within the expected length for the type of content being streamed, then the data can be flagged as non-audio content. However, data corruption needs to be taken into account when monitoring for frame headers, so multiple frames worth of data should be monitored before determining whether the flag the content stream as comprising non-audio content. After a predetermined number of frames are flagged as non-audio, the streaming of the content can be aborted. Another solution is to model video as random data. But, this approach is very computationally intensive, and may not be suitable for use on mobile devices that may have limited processing power and a limited onboard power supply.
But, the techniques discussed above do not work for Pulse-Code Modulation (PCM) audio format, and in particular for situations where audio content is streamed as linear PCM, uncompressed audio but the video is still transmitted in a compressed format. The PCM audio content is raw data that does not include any identifiable headers like the MP3 or AAC format which can be used to distinguish audio content streams from video content streams. Attempts have been made to analyze the content stream to determine the stream's spectrum and to classify the content stream as non-audio content if the stream has a spectrum that is too close to noise. But, this approach is computationally prohibitive, particularly in mobile devices, which may be constrained in both processing resources and power consumption. This approach also will result in the rejection of audio content that includes portions that are similar to white or pink noise or that present significant distortion. Audio with such characteristics do occur occasionally in audio content and the playback of such content would be mistakenly marked as video content and interrupted using conventional techniques.