Transmission of moving pictures in real-time is employed in several applications like e.g. video conferencing, net meetings and video telephony.
However, representing moving pictures requires bulk information as digital video is typically represented by up to 60 pictures each second, each picture represented by a large number of pixels, which in turn is represented by at least one byte of digital data. Such uncompressed video data results in large data volumes, and cannot be transferred over conventional communication networks and transmission lines in real as it would require an unrealistic network bandwidth.
Thus, real time video transmission video compression, where the main goal is to represent the video information with as few bits introducing as low latency as possible without compromising too much with video quality.
The most common video coding method is described in the MPEG* and H.26* standards. The video data undergo four main processes before transmission, namely prediction, transformation, quantization and entropy coding.
The prediction process significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. Since the predictor part is known to both encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation, and is usually referred to as the residual.
The residual represented as a block of data (e.g. 4×4 pixels) still contains internal correlation. A well-known method for taking advantage of this is to perform a two dimensional block transform. The ITU recommendation H.264 uses a 4×4 integer DCT transform. This transforms 4×4 pixels into 4×4 transform coefficients and they can usually be represented by fewer bits than the pixel representation.
Transform of a 4×4 array of pixels with internal correlation will probably result in a 4×4 block of transform coefficients with much fewer non-zero values than the original 4×4 pixel block.
Direct representation of the transform coefficients is still too costly for many applications. A quantization process is carried out for a further reduction of the data representation. Hence the transform coefficients undergo quantization. The possible value range of the transform coefficients is divided into value intervals each limited by an uppermost and lowermost decision value and assigned a as fixed quantization value. The transform coefficients are then quantified to the quantization value associated with the intervals within which the respective coefficients re-side. Coefficients being lower than the lowest decision value are quantified to zeros. It should be mentioned that this quantization process results in that the reconstructed video sequence is somewhat different compared to the uncompressed sequence.
As already indicated, one characteristic of video content to be coded is that the requirements for bits to describe the sequence is strongly varying. For several applications it is well known for a person skilled in the art that the content in a considerable part of the picture is unchanged from frame to frame. H.264 widens this definition so that parts of the picture with constant motion can also be coded without use of additional information. Regions with little or no change from frame to frame require a minimum number of bits to be represented. The blocks included in such regions are defined as “skipped” or to be in “skip mode”, reflecting that no changes or only predictable motion relative to the corresponding; previous blocks occur, hence no data is required for representing these blocks other than an indication that the blocks are to be decoded as “skipped”. This indication may be common to several macro blocks.
As H.264 is a decoding specification, it does not describe any methods for detecting regions of marginal or no changes prior to the transformation and quantization process. As a result, these regions could undergo both motion search, transformation and quantization, even if they finally would be defined as skipped and not represented with any data. As these operations require processing capacity, this is unnecessary consumption of resources in the encoder.
Video encoding for HD formats increases the demands for memory and data processing, and requires efficient and high bandwidth memory organizations coupled with compute intensive capabilities. Due to these multiple demands, a flexible parallel processing approach must be found to meet the demands in a cost effective manner.
Video codecs are typically installed on customized hardware in video endpoints with DSP based processors. However, it has recently become more common to install video codecs in general purpose processors with a SIMD processor environment.
Normally the “early skip” process mentioned above is a complicated process that is computationally expensive, since one has to manually transform and quantize each of the 16 4×4 blocks, one by one, utilizing 16-bit and 32 bit precision that leads to extensive register usage in SIMD processor environment.
A simplification of the “early skip” process is described in U.S. Pat. No. 7,295,613 “Early detection of zeros in the transform domain” by Gisle Bjøntegaard. However, this simplification does not take into account the SIMD processor environment, and is using Hadamard transform in the detection of “early skip” instead of DCT. In addition, Bjøntegaard calculates just a few coefficients and compares with a scalar threshold. This may lead to inaccurate results degrading perceived video quality without achieving any significant improvement in SIMD processor utilization.
Therefore, there is a need for a time and processor efficient “early skip” method taking advantage of the nature of the general purpose processors in a SIMD processor environment without compromising with data quality.