Effective and flexible storage of collections of audio and video content objects has always been a challenge because of the large number of such objects retained, even by an individual or family. The migration of such continuous media audio or video content to digitally coded forms, and the related convergence of devices for storage and use of such content has stimulated the development of a wide range of storage systems and devices. Various devices have been employed using both fixed media, such as computer-style electronic storage and hard disks, and removable media, such as video cassette recordings (VCR), compact disc (CD), Digital Versatile Disk (DVD), removable electronic storage (such as flash memory or EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic cards, floppy disks, and the like.
Historically, different forms of audio and video have been handled by different systems and devices, but “digital convergence” is leading toward common and interconnected systems for reception, storage, and playback of all kinds of media. For example, current Digital Video Recorders (DVRs) such as TIVO, REPLAY TV, and ULTIMATETV use PC-style hard disks, and software exists to provide DVR functions on a personal computer (PC), and some such devices are packaged with TV set-top boxes, such as satellite or cable decoders.
The large amount of data bits needed to store audio and video with satisfactory quality of reproduction presents a particular challenge in both the storage and transmission of digital format media objects. This has been particularly critical in transmission, due to rigid bandwidth limits and cost constraints that apply to radio, television, online services, wireless, and the like. In response to this, the use of data compression techniques has become an essential tool. Similar issues apply to other media objects, such as still images.
A variety of methods are used in an effort to gain maximum reduction of object size with minimum loss of quality. Compression has long been used in the coding of audio into digital form for digital telephony as well as in modems and other forms online transmission, in the coding of image files and video for faxing and online distribution, and more recently for digital radio and television. Compression has also been used in data processing systems, with the early impetus again coming from transmission constraints in early remote processing systems. The storage benefits of compression have generally followed as a secondary benefit from work relating to transmission needs.
A key difference between compression in data processing systems and in media systems is in the applicability of lossy compression techniques that can reduce object size by discarding some information that causes little or no perceptible loss in quality. Lossless compression is usually required in data processing environments, because every bit of data can be highly significant. Lossless compression techniques, such as PKZip and Huffman entropy codes, work by transforming inefficient coding schemes to more efficient schemes, such as by exploiting repeating patterns, without loss of any bits in the inversely transformed, decompressed result.
Lossy schemes are used mainly for audio, image and video data, and are typified by standards such as JPEG (the Joint Photographic Experts Group) image format and MPEG (the Motion Picture Experts Group) video format, including the MP3 audio coding defined within MPEG. These lossy schemes generally work by exploiting knowledge of human perception of sound and sight, so that some selected detail is discarded in compression and permanently lost in the decompressed result, but the perceptible impact of that loss is minimized. Such methods involve a trade-off in the amount of reduction obtained and the level of perceptible loss of quality. The level of these tradeoffs can be controlled by the designer of the compression scheme, or left as a parameter to be controlled at the time compression is done.
A further technique of progressive or scalable, layered compression has been applied to exploit these tradeoffs, particularly as they relate to transmission and rendering of digital audio and video. JPEG standards include an optional progressive format, in which the image file is structured to contain a layered sequences of scans, so that the first scan contains a low quality rough version of the image, followed by successive scans that add more detail. This enables an online transmission in which the start of the file can be received and used to present the low quality image, while successive scans are still being received and then progressively rendered to enhance quality. The viewer benefits by not having to wait to see a rough form of the image, gaining both real utility, in having some information, and psychological benefit in not waiting with a blank screen while all of the image data is received and decoded.
As transmission speeds have increased and video transmission has become a greater challenge than transmission of stills, a variation on this technique that addresses motion compensation has been applied in MPEG, with some attention in MPEG-2 and greater focus in MPEG-4. Motion video compression applies to both the spatial dimension of images and the time dimension of the series of images. The multiple images of video data are organized into groups of frames (GOFs) and the layering of gross and refined data is done for each GOF, addressing both spatial frame image data and temporal inter-frame motion differences. Layering here enables achievement of a high degree of flexibility, in that: 1) if bandwidth is limited, even momentarily, the low-significance layers can be discarded by the transmitter or ignored by the receiver in a highly adaptive fashion, and 2) if the receiver has limited processing or working storage resources for decoding (or a low resolution/low fidelity output device that will not benefit from high quality input), it can limit itself to the high-significance layers, all while minimizing the perceived quality loss. Thus such layered coding supports flexible scalability of a single coded form of the data to use for varying bandwidth and decoder or viewer requirements. Fine Granular Scalability (FGS) methods are intended to allow for very flexible scalings of data transmission to the dynamic availability of transmission bandwidth.
A further method of exploiting this scalability in video transmission, as proposed by Radha and by McCanne and others (including much of a Special Issue on Streaming Video in the March 2001 IEEE Transactions on Circuits and Systems for Video Technology), is that of Receiver-driven Layered Multicast (RLM). This involves transmitting each of multiple such layers of video in separate multicast streams, so that different receivers can subscribe to only the number of streams (layers) that they require to obtain the quality level they desire. This provides a high quality transmission to receivers that will benefit from it, while accommodating those that will not. Such methods teach that a layered data stream be partitioned at the time of transmission to create separate layered streams for simultaneous multicast, and that these streams then be reassembled on receipt for combined presentation.
As noted, the emphasis on media compression has been for transmission, rather than for storage management. Data processing systems have, in contrast, seen some attention to compression-oriented storage management, but efforts to apply such methods broadly have been disappointing, and have fallen into disuse. For example, PC storage compression systems like STACKER, DRIVESPACE, and DOUBLESPACE were introduced some time ago to apply compression across various files types commonly stored on a PC disk. Such tools were based on general-purpose compression algorithms to be used across all files on a disk, and they used lossless compression because that was required for data processing files. Such methods are trouble prone, and do not produce good results on audio and video files which are increasingly the dominating use of storage space. The result has been that file-type and format-specific compression techniques have been recognized as saving storage, but attempts to apply compression broadly to storage management have not been seen as promising areas for development.
Thus, specific media compression methods have enabled corresponding benefits in storage systems, even though that has not been a primary objective in their development. For example storage systems based on JPEG compression in digital cameras, and MPEG compression in Digital Video Recorders (DVRs) commonly give the user a choice of quality/compression levels. These allow the user to determine either globally or on an object-by-object basis whether they should be stored at high quality levels that consume large amounts of storage, or in more highly compressed but lower quality form.
Unfortunately, in current systems, these quality/size trade-offs must be made at the time that an object is stored. This is a problem, because the value of the object and the availability of storage space may change over time. As a collection grows and the older items in it age and possibly fall into disuse, their importance may diminish. At the same time, the total amount of storage used will grow, and constraints may make it desirable to reclaim space. In current systems, no provision is made to change the level of compression of an object. The only way to reclaim space is to reduce quality to zero by deleting the object in its entirety. There is, therefore, a need for methods to provide for less draconian, more progressive and gradual ways to adjust the space allocated for stored media objects in a memory device.
A simple way to do this is to take the existing objects, decompress them, and recompress them into a more compressed form, but this takes considerable processing resources, and depending on the specific formats involved, may produce unwanted losses of quality without compensating reductions in size, which can result from non-reversible transformations in the decompression-recompression process. Chandra et. al. at Duke University disclose work that addresses such transcoding of images, with applications to Web browsing for transmission, and to digital cameras for storage. Chandra recognizes the value of being able to reduce the size of images stored in a digital camera, so that additional pictures can be stored in cases when additional storage is not available. The technique applied is to partially decompress the image, and then re-compress from that intermediate form to a lower quality level. This involves the steps of entropy decoding, dequantization, requantization, and entropy recoding. Successive reductions may be done with successive cycles of such transcodings. The problem is that this method requires significant processing resources and time for the transcoding, and for the reading and rewriting of the stored object, each time such a reduction is made.
Layered, scalable compression techniques have the potential to facilitate such an objective to the extent that a storage system can reduce the number of layers that are stored for an object, without having to decompress and recompress the objects. Castor et. al. in U.S. Pat. No. 6,246,797 discloses work that addresses image file size reduction in still and video cameras. With regard to still images, Castor observes that “[a] feature of the present invention is that the image quality level of an image file can be lowered without having to reconstruct the image and then re-encode it, which would be costly in terms of the computational resources used. Rather, the data structures within the image file are pre-arranged so that the image data in the file does not need to be read, analyzed or reconstructed. The image quality level of an image file is lowered simply by keeping an easily determined subset of the data in the image file and deleting the remainder of the data in the image file, or equivalently by extracting and storing in a new image file a determined subset of the data in the image file and deleting the original image file. Alternately, data in an image file may in some implementations be deleted solely by updating the bookkeeping information for the file, without moving any of the image data.” And again, the key difficulties in handling of video data remain unrecognized and un-addressed in this type of system.
Castor describes a video image management system in which similar “wavelet or wavelet-like transforms” are applied to “each set of N (e.g., sixteen) successive images (i.e., frames).” More particularly “In all embodiments, the image file (or files) representing the set of frames is stored so as to facilitate the generation of smaller image files with minimal computational resources” (relative to transcoding). While this method eliminates the need for transcoding processing, it does not avoid other significant costs involved in continuous media.
The problem is that a video object of more than a few seconds will contain large numbers of GOFs. Since the layering is computed and stored on a GOF-by-GOF basis, there will be many small sets of low-significance data scattered throughout the file structure used to store the compressed video. The reduction requires “extracting a subset of the data in the set of image data structures and forming a lower quality version of the set of image data structures that occupies less space in the memory device.” This means that the file must be read, restructured, and rewritten, which can be a significant cost for large files, even though the costly decompression-recompression steps of transcoding are avoided.
There is evidently no recognition that similar methods might be applied to audio, but in any case, similar problems may be expected to arise there as well. MP3 audio and similar formats are, like MPEG video, designed with real-time transmission and play as a primary objective, and thus must store all data equivalent to a “GOF” or small set of audio samples together within a short “window” period or transmission packet frame. So reducing the size of such an audio file would involve a similar elimination of low-significance layer data that is scattered throughout a large audio file, with similar processing and input/output costs.
It should also be noted that there has been attention to maintaining the ability to support functionality like that of a Video Cassette Recorder (VCR), such as random access, fast-forward, backward, fast-backward, stop, pause, step-forward, slow-motion, or other “trick” functions when using layered data formats. Consistent with the orientation of layered methods to transmission rather than storage, this has been in the context of data stored at a remote source server site and transmitted in layers to the recipient site, as in the Lin paper in the IEEE Special Issue, not the possibility of local storage at the recipient/playback site, such as is typical of a consumer VCR or DVR.
The underlying broad challenge is the difficulty of simultaneously meeting the needs of content creation, initial storage at the content or distribution source, real-time transmission and presentation to a potentially large number of distant viewers or listeners with varying communications facilities and player devices (whether by appointment or on demand, and/or batch transmission and asynchronous viewing), plus, local storage at the receiving site, and deferred presentation of stored content. The primary orientation of most work on media compression is toward the needs of real-time transmission and play, and that objective will remain critical. However, as the retention of media objects in a long-term storage system, library, or archive at the user's site begins to involve large collections and substantial resources, the problem of managing local storage over the life-cycle of media objects will also become increasingly important.