The present invention relates to video coding and decoding, and in particular, to systems and methods for perceptually lossless video compression.
High-quality digital video content is usually represented with 24 to 30 bits per pixel—8 or 10 bits for each of the three basic color components, namely Red (R), Green (G) and Blue (B) in the RGB color space or for luminance (Y) and the two chrominance components (Cb and Cr) in the YCbCr colorspace. In the uncompressed state, high-definition content at typical spatial (1920×1080) and temporal resolution (60 frames/sec) results in storage requirements around 62 Mbits per frame and a source data rate of 3.73 Gbps. These requirements may lead to large transmission time and memory requirements in broadcast and/or storage applications which need to support high-definition video content.
Compression is essential for the transmission and storage of such digital video content. Several compression standards and proprietary techniques (e.g. MPEG-2, H.264 and VC-1) have been developed and deployed to realize effective compression of such video content. These compression schemes strike a balance between transmission time and memory requirements on the one hand and image quality on the other. One class of compression schemes are referred to as “lossless” schemes wherein the compression method is such that the signal after it undergoes decompression is exactly the same signal on a bit-for-bit basis as the signal that was input to the compression process. The other class of techniques is commonly referred to as “lossy” schemes since the compression method discards information within the content that is deemed to be visually insignificant. Compression standards such as MPEG-2, H.264 fall into the latter category. Lossy techniques offer higher compression than lossless techniques due to their ability to approximate the signal by discarding certain components of the signal and reducing the redundancies in the compressed signal representation, whereas a lossless technique is only able to reduce redundancies in the signal representation.
The choice of a lossless or lossy scheme is application dependent and related to tradeoffs in memory size, transmission time and the target image quality that is to be achieved. Strictly speaking the desired image quality may not often be achievable if the memory requirement or the transmission time requirement has to be strictly met and often the designer specifies the memory and/or transmission time requirement and accepts the resulting image quality delivered by the codec. The inability to achieve the desired image quality may render many of the compression schemes unusable within some application scenarios that are display centric.
The problem with most of the currently available video compression methods is that they are designed for one type of data and generally do not work well for mixed mode content or data that does not fit well with the model of the source assumed by the compression method. MPEG-2 and H.264, for instance, assume that the source is natural video content and such a scheme may not work well if the source was a mix of natural video and graphics and/or text.
Another problem with these compression methods is that they are targeted primarily from the viewpoint of realizing efficient storage and transmission of natural video content, and thus the system is optimized for the high compression ratios that are needed for typical transmission schemes and/or typical storage devices for such video content such as DVD media. At this high compression ratio, the artifacts induced in the “lossy” coding process are often visible when displayed. Such compression schemes are agnostic to display quality—they consider only the transmission bandwidth or the storage capacity for realizing the desired compression.
Yet another problem associated with the deployment of such compression methods is that they are primarily tailored for application scenarios wherein it is assumed that there will be a plurality of decoders with only a few instances of encoding. This approach is well matched to broadcast applications and playback of stored content such as from a DVD player. Thus, these compression standards tend to have fairly complex encoding strategies and a simple decoding process. These schemes may not be suited for applications wherein both encoding and decoding resources may have to be available within the same system.
An application scenario where the compression scheme is incorporated within a display system framework imposes a different set of requirements on the compression scheme. Specifically, the primary constraint on the compression scheme is that it provides a perceptually lossless rendering of the content on the display. In this context the compression scheme is essentially transparent to the end user. In previous work related to compression schemes for display, the focus has been on using the decompressed representation of the compressed frame within the display controller's frame buffer for the purposes of deriving a control signal only, and the decompressed representation is never used for the purposes of display. Thus, these schemes did not strive to achieve a visually lossless representation for the decompressed signal. Furthermore, in this application scenario, the complete encode/decode processing chain should possess very low latency since this is a display application and thus needs to demonstrate real-time performance. Furthermore, in this application scenario, both encoding and decoding is needed within the same system. For example, a display module wherein the content maintained within a frame buffer of the display may be kept in compressed form and then decompressed at the time of rendering of the content. Within a display system capable of handling video content, often the display system has to support a wide range of spatial resolutions for the content. Most compression schemes that are based on processing of the content in a block by block manner tends to have a fixed geometry for the blocks which may be optimum for one resolution of the video content but may be suboptimum for another resolution of the incoming video content. Using suboptimum block sizes will not give the best image quality in the decoded video.
Another such application scenario is a networked display device wherein the content has to be compressed just enough to match the bandwidth requirements of the networking protocol only, and the display itself would decompress the content coming over the network interface and then render the content to the display. In a storage setup implemented within a display system, it may be cost effective to have a 16 Mbits memory module. Thus, a compression factor of 4-6 is sufficient for the high-definition video content. Note that H.264 and MPEG-2 strive for compression factors around 200:1, and at these compression factors, the artifacts introduced during the lossy coding process make the resulting displayed image not to be perceptually lossless. Using very low compression factors such as 4-6 within these methods make the scheme very inefficient and memory intensive. In such application scenarios, what is also needed are a simple encoder and decoder. Furthermore, the bandwidth constraint in such applications is not too severe compared with the case of storage or broadcast applications, and thus it may be feasible to realize compression of the content so that when decompressed and rendered on the display device, the image is deemed to be perceptually lossless.
Using the methods developed in the compression standards directly within these application scenarios would not provide the high image quality while at the same time provide a simple implementation capable of handling the mixed mode content that has to be supported by the display system.
Thus, it is desirable to improve video coding and decoding techniques. The present invention addresses these and other problems by providing systems and methods for perceptually lossless video compression.