In general, data compression reduces the size of a digital file. A compression algorithm typically makes the digital file smaller by representing strings of bits (i.e., logical 1s and 0s), which make up the digital file, with smaller strings of bits. For example, in some systems, this may be accomplished by using a dictionary, or so-called codebook. This reduction typically happens at the encoding stage prior to transmission or storage. So, when such a reduced-size string is received at the decoding stage for playback, the decoding algorithm uses the codebook to reconstruct the original content from the compressed representation generated by the encoding algorithm. Whether the reconstructed content is an exact match of the original content or an approximation thereof depends on the type of compression employed. Lossless compression algorithms allow the original content to be reconstructed exactly from the compressed message, while lossy compression algorithms only allow for an approximation of the original message to be reconstructed. Lossless compression algorithms are typically used where data loss of original content is problematic (such as the case with executable files, text files, and digital data files where loss of even a single bit may actually change the meaning of the content). Lossy compression algorithms are typically used for images, audio, video, and other such digital files where a degree of intentional data loss is imperceptible or otherwise at an acceptable level. With respect to lossy compression, note that the bit loss is not random; rather, the loss is purposeful (bits representing imperceptible sound or visual distinctions or noise can be targeted for exclusion by the lossy compression algorithm).
Data compression is commonly used in applications where the storage space or bandwidth of a transmission path is constrained. For example, images and video transmitted via a communication network such as the Internet are typically compressed. One such example case is the so-called “cloud DVR” service, which allows for streaming of compressed digital video content from a remote digital video recorder to a user's playback device, such as a television, desktop or laptop computer, tablet, smartphone, or other such playback device. Numerous compression schemes are available for streamed video including, for example, the various MPEG compression algorithms, as well as codebook-based Vector Quantization (VQ) techniques.
Codebook-based vector quantization generally begins with vectorization of a video stream by breaking the stream into smaller chunks of 1s and 0s (i.e., vectors) and then comparing each input vector to vectors of a given codebook to find a closest match. The index of the entry in the codebook providing the closest match to the input vector can then be used to represent that input vector. Additionally, a residual vector may be generated which represents a mathematical difference between the given input vector and the most similar codebook vector. The residual vector, paired with the codebook index, allows for lossless compression. Once coded, the content can be more efficiently stored and transmitted (i.e., use less storage space and transmission bandwidth), since only the indexes and residuals are stored and transmitted rather than the longer vectors.
For codebook-based compression schemes, such as VQ, the quality and degree of compression achieved is, at least to some extent, dependent of the representativeness of the codebook with respect to the input content to be compressed. To this end, codebooks used in such compression schemes are typically trained across multiple videos or channels over a period of time. The channels are generally controlled by a given content provider. The so-trained codebooks can then be used for compressing new data in those channels.
In any case, because the content is stored in the cloud-based DVR, the user doesn't need to have the content maintained in a storage local to the playback device. As will be further appreciated, because compression makes the given digital file smaller (i.e., fewer bits), that file can be stored using less memory space and transmitted faster, relative to storing and transmitting that file in its uncompressed state.
However, there are a number of non-trivial problems associated with cloud-based DVR services. One such problem is related to the legal requirement that each user's recordings stored in the cloud DVR must be a distinct copy associated with that user only. In another words, even though multiple users have recorded the same program (some piece of digital content), the cloud DVR service provider is required to save a single copy of that program for each of those users. Thus, a storage-conserving technique such as data deduplication, which avoids content storage redundancy by leveraging a common copy of content that is accessible to all users by operation of a pointer-based system, is unacceptable where the one copy per user requirement applies. This requirement of a single copy per user is based in copyright laws related to the right of an individual to legally record content for purpose of time-shifting the personal viewing of that content. Thus, a content service provider that is tasked with providing the same content item to multiple users may still be constrained from a storage perspective and may particularly benefit from improved compression schemes.