Packetized data transmission and streaming media environments present many challenges for the system designer. For instance, clients can have different display, power, communication, and computational capabilities. In addition, communication links can have different maximum bandwidths, quality levels, and time-varying characteristics. A successful video streaming system, for example, must be able to stream video to different types of clients over time-varying communication links, and this streaming must be performed in a scalable and secure manner. Scalability is needed to enable streaming to a multitude of clients with different device capabilities and security is important, particularly in wireless networks, to protect content from eavesdroppers.
It is noted here that the term “data,” as used in this application, can refer to any form of electronic information. “Data” can mean streamed media, such as video or audio, computer information communicated between parts of a network or within a computer's architecture, information computed between computers in a network or via the Internet, whether such information is communicated in wire cables, optical links, wirelessly or by some other means. Data is commonly communicated in a packetized format. The terms “media stream” and “data stream” are used herein to indicate a sequence of data packets being communicated.
It is also noted that, although the ensuing discussion specifically mentions the shortcomings of prior art approaches with respect to the transmission of data, such shortcomings are not limited solely to the communication of any particular type of data. Instead, the problems of the prior art span various types of media data including, but not limited to, audio-based data, speech-based data, image-based data, graphic data, web page-based data, and the like.
In an environment of streamed media data, in order to achieve scalability and efficiency, it is necessary to adapt, or transcode, the compressed media stream at intermediate network nodes. A transcoder takes a compressed media stream as the input, then processes it to produce another compressed media stream as the output. Exemplary transcoding operations include bit rate reduction, rate shaping, spatial down-sampling, frame rate reduction, and changing compression formats. Network transcoding can improve system scalability and efficiency, for example, by adapting the spatial resolution of a video stream for a particular client's display capabilities or by dynamically adjusting the bit rate of a video stream to match a wireless channel's time-varying characteristics.
By way of example, a streaming media video clip may be part of a presentation of a web page. Large and powerful desktop receivers on a large bandwidth connection may receive and decrypt a full resolution, full frame rate, video stream of high-definition television (HDTV) for instance. However, a wireless adjunct to the same network may only be able to connect wireless users at a much smaller bandwidth. The stream must be converted to a smaller bandwidth signal in order to be carried. The conversion is called transcoding.
While network transcoding facilitates scalability in video streaming systems, it also poses a serious threat to the security of the streaming system. This is because conventional transcoding operations performed on encrypted streams generally require decrypting the stream, transcoding the decrypted stream, and then re-encrypting the result. Specifically, the transcoder requires the encryption key and the content is decrypted, and in plain form, at the transcoder. Because every transcoder must decrypt the stream, each network transcoding node presents a possible breach in the security of the entire system.
Furthermore, there may be strategically placed nodes in a network that are ideally located for performing transcoding but cannot be trusted. These untrusted nodes may be individual computers, client intranets at remote locations, or any other node that is interposed between an original sender and an intended receiver.
More specifically, in conventional video streaming approaches, for example, employing application-level encryption, video is first encoded, or compressed, into a bitstream using inter-frame compression algorithms. The resulting bitstream can then be encrypted, and the resulting encrypted stream is packetized and transmitted over the network using a transport protocol such as unreliable datagram protocol (UDP).
It is noted here that, in this discussion of background, the use of the terms “encode, decode, encoding, decoding, encoded, decoded,” etc. refer to the compression or other encoding of data into forms suitable for transport over network carriers, whether those carriers are cable, optical fiber, wireless carrier or other network connection. “Encrypt, decrypt, encrypting, decrypting, encryption, decryption,” etc. refer to cryptographic encoding that is used to protect the security of data from unauthorized recipients or to verify that the data received is exactly what was originally sent.
FIG. 1A is a block diagram, 100, which illustrates the order in which conventional application-level encryption is performed (i.e. Compression Encoding, 102, Encryption/Checksum Computation, 104 and Packetization, 106). One difficulty with this conventional approach arises when a packet is lost. Specifically, error recovery is difficult because without the data from the lost packet, decryption and/or decoding may be difficult if not impossible.
FIG. 1B illustrates the resultant packetized data stream as produced by process 100 of FIG. 1A. Data stream 111 is compressed by compression encoding function 102, encrypted and cryptographic checksum (CCS) 112 is appended by Encryption/CCS function 104. Packetization, 106, separates the signal, consisting of the data and CCS, into packets of the network's required size, 113. All of the packets must be reassembled into the encrypted data stream in order to decrypt the data, or verify it if it is unencrypted. If one of the packets 113, is lost, then the entire message is lost due to the invalidity of the CCS without the missing packet.
It is noted here that encryption and CCS computation are related but not the same operation. A CCS can be computed, or calculated, and appended to an unencrypted data stream and the CCS can be used to verify the integrity and authenticity of the stream at the receiver. Integrity of data means that the set of data sent is the set of data received and that the receiving device can have a relatively high degree of trust in the received data. Authenticity of data means that the set of data received is actually sent by the purported sender, e.g., that no hostile alias has been used. Again, the term relates to the degree of trust the receiver can have in the received data.
FIG. 1C illustrates a functional block diagram of a transcoding process in which encrypted data must be transcoded for reasons discussed previously. In process 120, the media stream is decrypted at 122, transcoded at 124, then re-encrypted at 126. During the period in which the data is unencrypted, it is accessible to unauthorized reading or manipulation at an insecure or untrusted node.
In hybrid wired/wireless networks, it is often necessary to simultaneously stream media to fixed clients on a wired network and to mobile clients on a wireless network. In such a hybrid system, it may often be desirable to send a full-bandwidth, high-resolution data stream to the fixed, wired, client, and a lower-bandwidth, medium-resolution media stream to the mobile wireless receiver. Conventional media streaming approaches, however, do not achieve the efficiency, security, and scalability necessary to readily accommodate the streaming corresponding to hybrid wired/wireless networks.
Furthermore, the growing sophistication and hostility of unwanted senders and transcoders of data, and the ever-increasing need for speed of data communication in the complex data environment work against each other. Increased need for speed drives the transmission of data to be simpler, ideally without any cryptographic checksums, while heightened security needs require more complex cryptographic checksums, calculated more often.
What is needed, then, is a method and/or system that can enable a potentially untrusted transcoder in the middle of a network to transcode a stream of packetized data while still preserving the end-to-end security of the rest of the stream. Such a method and system should also provide communication integrity checking without undue drain on the speed of communication. Specifically, what is needed is a means for computing and performing the cryptographic checksum that allows a potentially untrusted transcoder to perform the transcoding in an appropriate manner, yet still allowing the intended receiver to validate the integrity of the transmitted data and allowing any encryption of the transmitted data to remain uncompromised and trusted to as high a degree as possible.