It is well-known that video signals are represented by large amounts of digital data, relative to the amount of digital data required to represent text information or audio signals. Digital video signals consequently occupy relatively large bandwidths when transmitted at high bit rates and especially when these bit rates must correspond to the real-time digital video signals demanded by video display devices.
In particular, the simultaneous transmission and reception of a large number of distinct video signals, over such communications channels as cable or fiber, is often achieved by frequency-multiplexing or time-multiplexing these video signals in ways that share the available bandwidths in the various communication channels.
Digitized video data are typically embedded with the audio and other data in formatted media files according to internationally agreed formatting standards (e.g. MPEG2, MPEG4, H264). Such files are typically distributed and multiplexed over the Internet and stored separately in the digital memories of computers, cell phones, digital video recorders and on compact discs (CDs) and digital video discs DVDs). Many of these devices are physically and indistinguishably merging into single devices.
In the process of creating formatted media files, the file data is subjected to various levels and types of digital compression in order to reduce the amount of digital data required for their representation, thereby reducing the memory storage requirement as well as the bandwidth required for their faithful simultaneous transmission when multiplexed with multiple other video files.
The Internet provides an especially complex example of the delivery of video data in which video files are multiplexed in many different ways and over many different channels (i.e. paths) during their downloaded transmission from the centralized server to the end user. However, in virtually all cases, it is desirable that, for a given original digital video source and a given quality of the end user's received and displayed video, the resultant video file be compressed to the smallest possible size.
Formatted video files might represent a complete digitized movie. Movie files may be downloaded ‘on demand’ for immediate display and viewing in real-time or for storage in end-user recording devices, such as digital video recorders, for later viewing in real-time.
Compression of the video component of these video files therefore not only conserves bandwidth, for the purposes of transmission, but it also reduces the overall memory required to store such movie files.
At the receiver end of the abovementioned communication channels, single-user computing and storage devices are typically employed. Currently-distinct examples of such single-user devices are the personal computer and the digital set top box, either or both of which are typically output-connected to the end-user's video display device (e.g. TV) and input-connected, either directly or indirectly, to a wired copper distribution cable line (i.e. Cable TV). Typically, this cable simultaneously carries hundreds of real-time multiplexed digital video signals and is often input-connected to an optical fiber cable that carries the terrestrial video signals from a local distributor of video programming. End-user satellite dishes are also used to receive broadcast video signals. Whether the end-user employs video signals that are delivered via terrestrial cable or satellite, end-user digital set top boxes, or their equivalents, are typically used to receive digital video signals and to select the particular video signal that is to be viewed (i.e. the so-called TV Channel or TV Program). These transmitted digital video signals are often in compressed digital formats and therefore must be uncompressed in real-time after reception by the end-user.
Most methods of video compression reduce the amount of digital video data by retaining only a digital approximation of the original uncompressed video signal. Consequently, there exists a measurable difference between the original video signal prior to compression and the uncompressed video signal. This difference is defined as the video distortion. For a given method of video compression, the level of video distortion almost always becomes larger as the amount of data in the compressed video data is reduced by choosing different parameters for those methods. That is, video distortion tends to increase with increasing levels of compression.
As the level of video compression is increased, the video distortion eventually becomes visible to the HVS and eventually this distortion becomes visibly-objectionable to the typical viewer of the real-time video on the chosen display device. The video distortion is observed as so-called video artifacts. A video artifact is observed video content that is interpreted by the HVS as not belonging to the original uncompressed video scene.
Methods exist for significantly attenuating visibly-objectionable artifacts from compressed video, either during or after compression. Most of these methods apply only to compression methods that employ the block-based Two-dimensional (2D) Discrete Cosine Transform (DCT) or approximations thereof. In the following, we refer to these methods as DCT-based. In such cases, by far the most visibly-objectionable artifact is the appearance of artifact blocks in the displayed video scene.
Methods exist for attenuating the artifact blocks typically either by searching for the blocks or by requiring a priori knowledge of where they are located in each frame of the video.
The problem of attenuating the appearance of visibly-objectionable artifacts is especially difficult for the widely-occurring case where the video data has been previously compressed and decompressed, perhaps more than once, or where it has been previously re-sized, re-formatted or color re-mixed. For example, video data may have been re-formatted from the NTSC to PAL format or converted from the RGB to the YCrCb format. In such cases, a priori knowledge of the locations of the artifact blocks is almost certainly unknown and therefore methods that depend on this knowledge do not work.
Methods for attenuating the appearance of video artifacts must not add significantly to the overall amount of data required to represent the compressed video data. This constraint is a major design challenge. For example, each of the three colors of each pixel in each frame of the displayed video is typically represented by 8 bits, therefore amounting to 24 bits per colored pixel. For example, if pushed to the limits of compression where visibly-objectionable artifacts are evident, the H.264 (DCT-based) video compression standard is capable of achieving compression of video data corresponding at its low end to approximately 1/40th of a bit per pixel. This therefore corresponds to an average compression ratio of better than 40×24=960. Any method for attenuating the video artifacts, at this compression ratio, must add therefore an insignificant number of bits relative to 1/40th of a bit per pixel. Methods are required for attenuating the appearance of block artifacts when the compression ratio is so high that the average number of bits per pixel is typically less than 1/40th of a bit.
For DCT-based and other block-based compression methods, the most serious visibly-objectionable artifacts are in the form of small rectangular blocks that typically vary with time, size and orientation in ways that depend on the local spatial-temporal characteristics of the video scene. In particular, the nature of the artifact blocks depends upon the local motions of objects in the video scene and on the amount of spatial detail that those objects contain. As the compression ratio is increased for a particular video, MPEG, DCT-based video encoders allocate progressively fewer bits to the so-called quantized basis functions that represent the intensities of the pixels within each block. The number of bits that are allocated in each block is determined on the basis of extensive psycho-visual knowledge about the HVS. For example, the shapes and edges of video objects and the smooth-temporal trajectories of their motions are psycho-visually important and therefore bits must be allocated to ensure their fidelity, as in all MPEG, DCT-based methods.
As the level of compression increases, and in its goal to retain the above mentioned fidelity, the compression method (in the so-called encoder) eventually allocates a constant (or near constant) intensity to each block and it is this block-artifact that is usually the most visually objectionable. It is estimated that if artifact blocks differ in relative uniform intensity by greater than 3% from that of their immediately neighboring blocks, then the spatial region containing these blocks is visibly-objectionable. In video scenes that have been heavily-compressed using block-based DCT-type methods, large regions of many frames contain such block artifacts.