High Efficiency Video Coding (HEVC) is a latest video compression standard, successor to H.264/MPEG-4 AVC (Advanced Video Coding), jointly developed by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) as ISO/IEC 23008-2 MPEG-H Part 2 and ITU-T H.265.
A video input signal has multiple frames. HEVC divides a frame into rectangular blocks or LCUs (largest coding units) or macro-blocks of 16×16, 32×32 or 64×64. An optimal size of the LCU is selected based on the video content. HEVC provides for video frame division into multiple tiles and slices to enable parallel processing. In this scheme, discontinuities can occur in a filtered video signal at the LCU boundaries which are known as blocking artifacts. The blocking artifacts can, for instance, arise due to different intra predictions of the blocks, quantization effects and motion compensation. Loop filters are used in the HEVC encoder/decoder in order to combat blocking artifacts.
HEVC promises half bit-rate compared to current de-facto video standard namely H.264 at a similar video quality and expected to be deployed in wide variety of video applications ranging from cell phones, broadcast, set-top box, video conferencing, video surveillance, automotive etc. HEVC is enabling industry in transitioning to 4K (ultra high-definition (HD)) resolutions due to better compression efficiency and transparent quality. The performance requirement for HEVC video solution can vary widely based on application area. This poses a new challenge to architects in designing HEVC hardware and/or software solution.
An approach of designing a single monolithic engine for ultra-HD resolution results in a complex design of hardware and software. Also, the single monolithic engine is non-optimal solution for lower resolution video for example HD (high definition) or D1 (standard definition).
An alternative approach for performance up-scaling is using multiple copies of video hardware engines and/or processor cores. This solution has issues in partitioning of frames across these multiple cores due to loop filter dependencies across slice and tiles.
The prior approaches of handling the loop filter dependencies had several drawbacks. A first approach is to disable loop filtering. This approach results in degrading the quality of the video at slice/tile boundaries. A second approach is to enable loop filtering and control a rate of encoding at the boundaries of the slices/tiles. The controlled rate of video encoding in this approach degrades video quality at other portions of the frame in addition to the boundaries of the slices/tiles.
A third approach is to provide multiple video processing engines and each engine processes a separate frame. This approach results in latency of frames and hence is not efficient for application such as video conferencing, video surveillance and gaming etc. A fourth approach is to use multiple video processing engines for processing of a video and a separate loop filter. The multiple video processing engines perform function such as motion estimation, transform and quantization. After these processing operations, the separate loop filter performs loop filtering. This approach increase the overhead of the system since an additional memory bandwidth is required for input and output of the separate loop filter and also it increases the processing cycles used for video encoding/decoding.