The present invention relates to video compression technology, and more particularly to tiling or blockiness detection based on spectral power signature.
Video signals from an original source, such as a television camera, when digitized, represent a great amount of data. In order to transmit this data to a receiver, the video signals are compressed by coders/decoders (codecs) using one of the well-known video compression techniques, such as H.264 or MPEG2. These compression techniques break the sequence of frames or video images, represented by the video signal, into blocks of data which are each compressed to produce compressed video. However transmission of the compressed video to the receiver most often requires that the data bit-rate is so low that information is lost, i.e., the compression is a “lossy” process. If the loss gets too high, then when the compressed video is decompressed at the receiver by an appropriate codec, the resulting video signal produces frames or video images that have visible artifacts corresponding to the edges of the blocks of data that were originally compressed, commonly referred to as tiling or blockiness.
Broadcast of compressed video streams using a radio frequency (RF) signal, either over the air (OTA) or via cable (CATV), or as data using internet provider (IP) networks, often results in additional data loss at times. This transient data loss may also cause blockiness to be visually apparent on some frames where incomplete transport data is received and decoded, i.e., some of the compressed video data is dropped.
In both cases of over-compression or data loss, the tiling or blockiness may be visually apparent, and distracting to a viewer. To determine the severity level of the impairment of the resulting video signal in a measurement environment, one current method is to compare the alternating current (AC) energy within each compression block with the AC energy between that block and a neighboring block to the right (horizontal edge or H-edge) and between that block and a neighboring block below (vertical edge or V-edge). These H and V energy ratios are summed to create a tiling value for each block. These tiling values are summed over the tiles in each of several regions within a frame to form a grid of tiling values for the image. Typically the largest value is reported as a tiling value for the image or frame. Note, that only tiling that occurs on a block grid aligned with pixel 0,0, i.e., aligned with the upper left corner of the image, is detected, resulting in some problems.
In MPEG2 compression coding, a series of images or frames in the video signal are compressed either individually, as I-frames, or by prediction in relation to surrounding frames estimating translated motion, such as B- or P-frames. Pixel 0,0 tiling is typically the case for a decoder I-frame output. However related P and B frames from the current decode may contain tiling, but the tiles are moved from pixel 0,0 within the frames by motion vectors. Therefore the tiling severity in these frames is not properly indicated. Also there could be tiling from a previously coded/decoded image that has been re-sampled or shifted and cropped as part of a second coding that would go undetected since it is not aligned to pixel 0,0. Finally, if there is tiling from a previous coded/decoded process where the image has been resized, such as a 1080i (interlaced) to 720p (progressive) conversion, then the tiling would go undetected at the decoder output since the block or tile sizes are no longer on the same grid spacing as the original compression process.
FIG. 1 represents a typical situation that results in the inability to detect tiling according to the present, above-mentioned, technology. An MPEG2 signal, representing original video that has been compressed, is input to a decoder to produce component video signals, such as Y, U and V signals. The decoder creates tiling, as shown in the Y-frame image representing the baseband video from the decoder. Since MPEG2 is based upon 8×8 tiles (or possibly 16×16 tiles), the resulting Y-frame image is a composite of decoded 8×8 tiles. In this situation the Y-frame represents a 1080i video signal (1080 lines by 1920 columns). An intermediary, such as a cable television company, may then resize the 1080i video signal to produce a 720p video signal (720 lines by 1280 columns). Visible tiling artifacts on the boundaries of the original 8×8 tiles are resized to ⅔(8×8) tiles. This 720p video signal is then encoded to produce an MPEG2 signal that is transmitted to an end user television set. The end user decodes the MPEG2 signal to produce the final version for viewing, which includes both the visible artifacts of the re-sized tiles (⅔(8×8)) as well as the 8×8 tiles from the decoder. The result may be an image that has a lot of visible ⅔(8×8) tile artifacts, which are not detected by the current techniques.
What is needed is a method of detecting the severity of tiling or blockiness in a decoded compressed video signal due to over-compression or data-loss at a decoder output that is insensitive to the phase-shift or alignment of the tiling pattern to pixel 0,0 and responsive to some of the typical image resizing ratios.