Modem consumers can receive high definition (HD) television on their Personal Data Appliance. They expect the delivery of HD video to cell phones, and the view through the window provided by DVD movies. High Definition video processing has migrated into all aspects of communication and entertainment. Many of the high definition broadcasts are bringing a realism that can only be matched by looking through a real window to watch the actual event unfold before the viewer.
In order to make the transfer of high definition video more efficient, different video coding schemes have tried to get the best picture from the least amount of data. The Moving Pictures Experts Group (MPEG) has created standards that allow an implementer to supply as good a picture as possible based on a standardized data sequence and algorithm. The emerging standard H.264 (MPEG4 Part 10)/Advanced Video Coding (AVC) design delivers an improvement in coding efficiency typically by a factor of two over MPEG-2, the most widely used video coding standard today. The quality of the video is dependent upon the manipulation of the data in the picture and the rate at which the picture is refreshed. If the rate decreases below about 30 pictures per second the human eye can detect “unnatural” motion.
Among the many important techniques in AVC standard, Intra/Inter mode selection play an important role in the improvement of the compression efficiency. To date, most of the work on mode selection focuses on rate distortion. Namely, how to obtain better Peak Signal to Noise Ratio (PSNR) by using same bit rate, or how to keep same PSNR by using less bit rate. Although this approach can improve the visual quality in low bit rate and low resolution video sequences, it is not optimal from the point of view of human visual system (HVS) when the focus is transferred to high resolution (HD) high bit rate video sequences.
Due to coding structure of the current video compression standard, the picture rate-control consists of three steps: 1. Group of Pictures (GOP) level bit allocation; 2. Picture level bit allocation; and 3. Macro block (MB) level bit allocation. The picture level rate control involves distributing the GOP budget among the picture frames to achieve a maximal and uniform visual quality. Although PSNR does not fully represent the visual quality, it is most commonly used to quantify the visual quality. However, it is noticed that the AVC encoder is intended to blur the fine texture details even in relative high bit-rate. Although AVC can obtain better PSNR, this phenomenon adversely influences the visual quality for some video sequences.
A GOP is made up of a series of pictures starting with an Intra picture. The Intra picture is the reference picture that the GOP is based on. It may represent a video sequence that has a similar theme or background. The Intra picture requires the largest amount of data because it cannot predict from other pictures and all of the detail for the sequence is based on the foundation that it represents. The next picture in the GOP may be a Predicted picture or a Bidirectional predicted picture. The names may be shortened to I-picture, P-picture and B-picture or I, P, and B. The P-picture has less data content that the I-picture and some of the change between the two pictures is predicted based on certain references in the picture.
The use of P-pictures maintains a level of picture quality based on small changes from the I-picture. The B-picture has the least amount of data to represent the picture. It depends on information from two other pictures, the I-picture that starts the GOP and a P-picture that is within a few pictures of the B-picture. The P-picture that is used to construct the B-picture may come earlier or later in the sequence. The B-picture requires “pipeline processing”, meaning the data cannot be displayed until information from a later picture is available for processing.
In order to achieve the best balance of picture quality and picture rate performance, different combinations of picture sequences have been attempted. The MPEG-2 standard may use an Intra-picture followed by a Bidirectional predicted picture followed by a Predicted picture (IBP). The combination of the B-picture and the P-picture may be repeated as long as the quality is maintained (IBPBP). When the scene changes or the quality and/or picture rate degrades, another I-picture must be introduced into the sequence, starting a new GOP.
To improve the compression efficiency, de-blocking filters and 4×4 transforms are included in H.264/AVC standard. The optimal Intra/Inter mode decision cannot be obtained without considering them. According to the history of the AVC standard, these tools were optimized for low bit-rate and low resolution, Quarter Common Intermediate Format (QCIF) and Common Intermediate Format (CIF) video sequences. When the focus was transferred to high resolution, Standard Definition (SD) and High Definition (HD) video sequences, the de-blocking filters and 4×4 transforms naturally became revision targets. Following this trend, 8×8 transform and quantization weighting matrices have been adopted by the Professional Extensions Profile of the AVC standard.
Most of the work on adaptive transform type selection focuses how to obtain better PSNR by using the same bit rate, or how to keep same PSNR by using a lower bit rate. Although this approach can improve the visual quality, it is not optimal from the point of view of the human visual system (HVS). The HVS is a luminance and contrast profile that represents human visual processing capabilities.
Thus, a need still remains for a video encoding system that can deliver high quality video to the high definition video market. In view of the ever-increasing demand for high definition video, it is increasingly critical that answers be found to these problems. In view of the ever-increasing commercial competitive pressures, along with growing consumer expectations and the diminishing opportunities for meaningful product differentiation in the marketplace, it is critical that answers be found for these problems as soon as possible.
Solutions to these problems have long been sought but prior developments have not taught or suggested any solutions and, thus, solutions to these problems have long eluded those skilled in the art.