In general, data compression reduces the size of a digital file. A compression algorithm typically makes the digital file smaller by representing strings of bits (i.e., logical 1s and 0s), which make up the digital file, with smaller strings of bits by using a dictionary, or so-called codebook. This reduction typically happens at the encoding stage prior to transmission or storage. So, when such a reduced-size string is received at the decoding stage for playback, the decoding algorithm uses the codebook to reconstruct the original content from the compressed representation generated by the encoding algorithm. Whether the reconstructed content is an exact match of the original content or an approximation thereof depends on the type of compression employed. Lossless compression algorithms allow the original content to be reconstructed exactly from the compressed message, while lossy compression algorithms only allow for an approximation of the original message to be reconstructed. Lossless compression algorithms are typically used where data loss of original content is problematic (such as the case with executable files, text files, and digital data files where loss of even a single bit may actually change the meaning of the content). Lossy compression algorithms are typically used for images, audio, video, and other such digital files where a degree of intentional data loss is imperceptible or otherwise at an acceptable level. With respect to lossy compression, note that the bit loss is not random; rather, the loss is purposeful (bits representing imperceptible sound or visual distinctions or noise can be targeted for exclusion by the lossy compression algorithm).
Data compression is commonly used in applications where the storage space or bandwidth of a transmission path is constrained. For example, images and video transmitted via a communication network such as the Internet are typically compressed. One such example case is the so-called “cloud DVR” service, which allows for streaming of compressed digital video content from a remote digital video recorder to a user's playback device, such as a television, desktop or laptop computer, tablet, smartphone, or other such playback device. A standard compression scheme for streamed video is MPEG compression, although there are numerous other compression standards that can be used. In any case, because the content is stored in the cloud-based DVR, the user doesn't need to have the content maintained in a storage local to the playback device. As will be further appreciated, because compression makes the given digital file smaller (i.e., fewer bits), that file can be stored using less memory space and transmitted faster, relative to storing and transmitting that file in its uncompressed state. However, there are a number of non-trivial problems associated with cloud-based DVR services. One such problem is related to the legal requirement that each user's recordings stored in the cloud DVR must be a distinct copy associated with that user only. In another words, even though multiple users have recorded the same program (some piece of digital content), the cloud DVR service provider is required to save a single copy of that program for each of those users. Thus, a storage-conserving technique such as data deduplication, which avoids content storage redundancy by leveraging a common copy of content that is accessible to all users by operation of a pointer-based system, is unacceptable where the one copy per user requirement applies. This requirement of a single copy per user is based in copyright laws related to the right of an individual to legally record content for purpose of time-shifting the personal viewing of that content. Thus, even with compression schemes in place, a content service provider that is tasked with providing the same content item to multiple users may still be constrained from a storage perspective.