1. Field of the Invention
This invention relates to a communications system for transmission of compressed video and, more particularly, to a circuit, communications system and method for reducing visible flicker in transmitted images.
2. Description of the Related Art
A communications system is generally well known as containing at least two nodes interconnected by a transmission line. The transmission line can either be a copper wire, optical fiber, or a wireless transmission medium. In many cases, the transmission line is within a network, which can include wire or cable links, wireless links, or a combination of these.
To allow large amounts of data to be transmitted at reasonable speeds, many communications systems employ data compression. Compression reduces the number of information bits used to represent a data file, so that transmission of the file is faster, less of a burden on the network, or both. The data is compressed by an encoder (encoded using a compression scheme) prior to transmission. After passing across the network, the compressed data is converted by a decoder back to an uncompressed form for further use or presentation. The encoder and decoder are often implemented as “codecs” on each end, where the codec can either compress or decompress the data stream.
Depending upon the particular compression scheme employed, the codec on the receiving end may not be able to fully restore the compressed data to its original uncompressed state. Some compression schemes are “lossy,” meaning that some information is irreversibly discarded by the encoder during the compression process. When a lossy compression scheme is used for transmission over a network, the amount of information discarded is often adjusted as a means of controlling the bit rate of the transmission. In the case of video images, for example, the density of bits needed to represent the video is lower when the image is spatially simple or moves slowly, and higher for more complex or fast-moving images.
Video compression encoders typically compress one “block” of digital image data at a time, where a block contains the data corresponding to either a 4×4 or 16×16 array of displayed pixels. The 16×16 sized block is also referred to as a “macroblock.” Because of the above-described variation in bit density with image complexity, an encoder moving through blocks of image data will output more bits per block, or a higher bit rate, for complex video portions than for simpler portions. Network and buffer requirements typically demand a relatively constant bit rate for transmission, however, necessitating use of a bit rate controller.
The effect of a bit rate controller may be more clearly understood in view of a simplified discussion of encoder operation. Three fundamental processes involved in many video compression encoding schemes are illustrated in FIG. 1. Digitized video frame block 102 in FIG. 1 represents the input to an encoder. Although the input data 102 is a stream of digital data, block 104 is shown as an example of what a block of video data might look like as displayed. Block 104 is shown as a 4×4 array of pixels for simplicity, but could also be a 16×16 macroblock, or any other size conveniently handled by a compression encoder. Input data 102 initially undergoes prediction process 106. The goal of prediction is to minimize the entropy for the transform process.
The prediction block, illustrated by block 108 in FIG. 1, derived from the best prediction mode, may be generated in different ways, depending on the prediction method used. In a method known as “inter prediction,” prediction block 108 is derived from a corresponding block in a different frame of video than the frame containing block 104. In “intra prediction”, prediction block 108 is instead derived from one or more blocks in the same frame as block 104. Commonly used prediction modes are: vertical, horizontal, DC (mean or average) and plane prediction. The plane prediction mode, for example, uses a linear function between the neighboring samples to the left and to the top in order to predict the current samples. Each prediction mode is assigned a uniquely identifiable code by the video compression standard, and the encoder and the decoder can reconstruct the same predicted pixel values given the prediction mode and the neighboring pixel samples.
Prediction block 108 is a 4×4 (or 16×16) approximation of the same area in the original picture. The value of each pixel in the prediction block may not match the original pixel value perfectly, but one prediction mode is chosen among all possible modes so that the overall difference between the prediction block and the original block is most beneficial to the rest of the encoding process. Prediction block 108 is subtracted from block 104. The result of the subtraction is called a residual block, illustrated as block 110 in FIG. 1. The digitized residual block is forwarded to the next stage of the encoder.
The next fundamental process in the encoder is transformation process 112. A mathematical transformation is used to represent the residual block as a combination of known basis patterns using a set of weighting coefficients. In the illustration of FIG. 1, residual block 110 could be represented as a sum of N basis patterns 114, each weighted by a corresponding coefficient C. The number N of basis patterns and the particular basis patterns used may vary depending on the block size and on the specific transform employed. The particular set of coefficients 116 used in the transform of the residual block being processed then undergoes quantization process 118.
In quantization process 118, the coefficients 116 are divided by an integer related to a quantization parameter QP. The coefficients are either truncated toward zero or rounded to the nearest integer after division, so that the effect of the quantization is generally to make more of the coefficients go to zero. This loss of coefficients represents a loss of image data that cannot be recovered in the decoding process. The specific relationship between QP and the quantization step depends on the particular compression scheme, but in general larger values of QP result in more data lost. A bit rate controller can therefore adjust the value of QP used by the encoder in order to control the bit rate of the encoder output. A higher QP reduces the bit rate of data from the encoder, which may keep the transmission within the bandwidth requirements of the network, but at the cost of more lost data and a lower image quality.
In further processing not shown in FIG. 1, the coefficients resulting from quantization process 118, along with other information describing the compression process, are coded into an efficient transmission format and output to a buffer. The data stream is then fed from the buffer to a network interface for transmission.
In many cases, the image quality reduction caused by a lossy compression scheme as described above is minimally perceptible to a viewer, if perceptible at all. This is particularly the case for data lost during bit rate control of frames containing rapidly-moving images. One type of distortion is perceptible and bothersome, however. An annoying flicker can be observed in some cases, particularly in image regions with little or no motion.
Previous methods of addressing this flicker problem include a determination of whether flicker is likely to occur. The determination may be done on a block-by-block basis for a frame being processed using inter prediction, as described in U.S. Publication No. 2008/0025397, hereby incorporated by reference herein. A block-by-block investigation done within the encoder (i.e., between prediction process 106 and transform process 112 in FIG. 1) adds undesirable complexity to the encoding process, however. In addition, a flicker determination based on inter-predicted blocks cannot be used for those encoders that utilize only intra prediction.
Another previous method, described in U.S. Publication No. 2009/0046092, hereby incorporated by reference herein, determines the likelihood of flicker by a calculation at the pixel level of a complexity parameter of a reconstructed (noncompressed) image, followed by normalization with complexity data calculated for a previous frame. This calculation for a noncompressed image at the pixel level also adds undesirable computational complexity to the encoder, and relies on previous image data that may not be available to encoders using intra prediction only.
Upon determining that flicker is likely to occur, the previous approaches referenced above include a modification of the encoder's process for selecting a prediction mode. In U.S. Publication No. 2008/0025397, a cost function to be minimized as part of the prediction mode selection is modified to include a comparison of reconstructed blocks from the current frame and from the previous frame. Similarly, in U.S. Publication No. 2009/0046092 the prediction mode selection process is modified to favor use of previous frame and/or future frame image data. These previous approaches not only add complexity, but again rely on image data from different frames which may be unavailable to intra-prediction-only encoders.
It would be desirable to have a way of detecting video frames likely to exhibit flicker that does not add computational complexity to the encoder and does not require storage of reference image frames. It would further be desirable to have a way of mitigating perceived flicker that neither adds computational complexity to the encoder nor requires storing of reference image frames.