Use of the internet continues to grow. Much of that use involves the accessing of files. The bandwidth available to the average user has reached a capacity at which people routinely access remote media, e.g., video and/or audio, via the Internet.
The original technology used on the internet for accessing remote files was developed for files whose content represented text or static images, i.e., non-video. Before a recipient could view the text and/or image file, it was necessary for the entire file to be downloaded. When media was first downloaded it was likewise necessary to download an entire media file before any part of it could be viewed and/or listened to by the recipient.
It was, and remains, a problem that media in digital format is relatively easy to copy and/or distribute, depending upon the available bandwidth. To control access to the media being downloaded, providers of the media began to encrypt the media. Just as for viewing and/or listening to a downloaded file (that was not encrypted), decryption typically began once the entire file had been downloaded.
Then it was realized that some media is generally serial in the manner in which it is perceived by the recipient. In other words, the recipient does not perceive the whole of the media at the same time. Rather, for the example of a movie, the recipient typically begins watching the movie at the start and watches it continuously until the end. Also, certain large text files are read from the beginning to the end. Thus, the Background Art realized that such files can be downloaded in a manner in which reconstruction and viewing-of/listening-to the files can begin before all of the data representing the file is received, which corresponds generally to sending such a file in order from its beginning to its end. This has been referred to as streaming the data to the recipient. The file being downloaded begins to be played/displayed moments after it is received and continues to play so long as additional data is received.
Conventional cryptographic algorithms are designed for text files (called messages in cryptography). To ensure high security, the entire text file (message) is encrypted before delivery. Compared to text data, media data, especially video and audio data, typically represents much larger files. Due to the high data rate needed for real-time delivery of those types of media files, encryption may add abundant processing overhead, making media streaming or real time accessing difficult to achieve. That is, the encryption techniques according to the Background Art are incompatible with real time or streaming data.
In addition, if a medium is compressed, encrypting the entire media file using conventional cryptographic algorithms may cause serious error propagation problems. If one bit is lost during transmission, the entire file may have to be retransmitted which will increase the bandwidth demand a great deal.
A solution to the general problem of encryption overhead is posed by U.S. Pat. No. 5,303,303, which discloses encrypting the body message only and leaving the header and trailer information unencrypted for data transmission over a packet switching network. The header information can then be used for proper data delivery. In the case of video and audio, the video and audio content data are still fully encrypted since only the header and trailer data are separated out. While addressing the problem of bit loss, this does not provide a sufficient solution to the real time, e.g., it is not compatible with streaming data transmission.
U.S. Pat. No. 4,172,213 teaches how to encrypt a message on a byte-by-byte basis to support selective encryption. However, the '213 patent presumes operation upon a conventional message in which all parts of the message are treated as being equally important which may result in over-encryption or under-encryption, in other words, over-protection or under protection.
In U.S. Pat. No. 6,070,245, a system and method for controlling encryption to be on or off during a connection oriented session between client and a server over the Internet is taught. A keyword or command carrying a unique structured field is provided to indicate when an encrypted mode should be made active or inactive. Because the command/keyword insertion has to be a pre-processing step, it is not adequate for the real time and streaming media applications.
When a data stream is broadcasting through multiple types of networks to different types of devices, it is often necessary to adapt the amount of data transmitted and amount of encryption, i.e., the amount of bandwidth and additional processing corresponding to the encryption, to each different type of network and device. A fixed field with a fixed indicator for toggling encryption on and off, such as disclosed in the '245 patent, will not provide the capability needed.
U.S. Pat. No. 6,415,031 teaches selective encryption for video on demand. A selective encryption field is used to indicate the data to be encrypted, for instance if an I-frame is to be encrypted or if an I-frame and a P-frame are to be encrypted. Since a field is used to indicate the payload to be encrypted, it has to be format compliant. The I- P- B-frame based selective encryption of the '031 patent will be useful for video in MPEG format but not for audio or video in Jmovie or another format. Further, the technique of the '031 patent does not provide sufficient scalability to accommodate data transmission over different types of networks to different types of devices. By using conventional encryption algorithms, significant computational power is needed and a significant bit rate increase may be observed due to a change in the statistical property of the data if the encryption is done on the frequency domain. Moreover, at most three levels of security can be achieved or the number of I-frames has to be increased, which will increase the bit rate significantly.
U.S. Pat. No. 5,805,700 teaches selectively encrypting compressed video data based on policy. Specifically it teaches how to selectively encrypt the start code of a GOP (group of pictures) or an I- P- B-frame in a MPEG-formatted video to achieve video image degradation with substantially less processing needed in contrast to that of full encryption. Because it is policy based, a policy has to be set via either human interaction or preprocessing of the video data via a statistical analyzer. Further, scalability is not achieved.
Network channel capacity at any given time, depending on the traffic and the type of connection, varies over a wide range. According to Li, “Overview of Fine Granularity Scalability in MPEG-4 Video Standard”, in IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, No 3, March 2001, an objective of video coding and transmission for Internet streaming video is to optimize the video quality over a given bit rate range. The bitstream should be partially decodable at any bit rate within the bit rate range to reconstruct a video signal, i.e., fine granularity scalability should be provided. As a result, media, video, image, audio, graphics, or voice encryption should be able to provide compatible fine granularity scalability, otherwise an encrypted media stream will not be able to be transmitted or reconstructed efficiently or correctly.