As electronics, in general, and consumer electronics, in particular, are equipped with faster chips and larger memories, consumers have come to expect them to be able to handle ever greater amounts of data and information. The data files that challenge device capacity typically include varieties of visual and pictorial contents, e.g., motion pictures, videos; and other complex computer graphics, e.g., those used in computer games. Technologies necessary for the handling, compression, and decompression of such data files related to visual and pictorial works are in great demand by consumers worldwide.
Compression of digital video signals for transmission or for storage has become widely practiced in a variety of contexts, especially in multimedia environments for motion pictures, video communication, computer games, Internet image/video streaming, digital television, and the like. Coding and decoding are accomplished with coding processors, which may be general computers, special hardware or multimedia boards, and other suitable processing devices. Standards for compression processes have been developed by the International Telecommunication Union (ITU), which has developed the H series standards used for real time communications such as those used in videophones, and the International Organization for Standardization (ISO), which has developed the Moving Picture Experts Group (MPEG) series standards, such as MPEG-1, MPEG-2, MPEG-4, and MPEG-7.
Digital video signals may include a sequence of pictorial data. The term “motion picture(s)” shall be used for convenience of expression, and shall mean any sequence of data amenable to quantization, at least a portion of which is pictorial. A motion picture usually includes many frames, and contains a large amount of information. However, the available storage space and bandwidth for transmitting such signals is often limited. Therefore, compression (coding) processes are used to achieve more efficient handling, transmission, or storage of the pictorial data.
Compression processes typically involve removal of the redundancy in the data, see U.S. Pat. Nos. 6,438,166 and 6,445,825. For the motion pictures, there usually exist a lot of similarities between consecutive frames. To remove these redundancies, the technology of “motion compensation” is usually used. Another redundancy that can be exploited is from the psycho-visual effect of human vision system. The human viewers cannot notice many minor variations in the images; thus the components that correspond to these unnoticeable variations can be coded approximately via a process called “quantization”. Quantization is a process in which transformation coefficients of sample signal data values, like color and luminance, are represented by, or are mapped onto, a few values predefined by a quantizer. The quantized signal is composed of quantized values that are, in fact, approximations of the actual signal values. Therefore, the encoding of the signal data onto the quantized values necessarily produces some loss in accuracy and generates some distortion of the signal after the decoding process.
The value of the quantization scale used for the quantization highly affects a compression ratio in the coding processing. The larger the value of the quantization scale, the higher the compression ratio, resulting in a greater reduction in the quantity of the coded data. The smaller the value of the quantization scale, the lower the compression ratio, resulting in a less reduction in the quantity of the coded data. The value of the quantization scale can be set and changed in the course of data processing, thereby controlling the quality of the generated code. A small quantizer signals a small step of quantization; hence a high data bit rate and fine resolution. A larger quantizer signals a coarse quantization step; hence a low data bit rate and poor video quality.
Typical motion pictures consist of stretches of scenes with various amounts of motion. Scenes may be slow motion, e.g., a sunset or a couple walking on the beach; or high speed motion, e.g., a high speed car race. For high speed motion scenes, the video frames require much more bits to code than those in low speed motion scenes under the same quantizer. However, the mixture of high speed motion and low speed motion scenes and the transitions among them are drastically different from one video sequence to another. A high bit rate often exceeds the processing capacity of a video signal transmission/display/record device. On the other hand, a low bit rate often does not fully utilize the capacity of the transmission channel and recording media to achieve the maximum potential quality. Thus, in the encoding of digital video data, a persistent problem has been how to accommodate any given movie sequence, and to allocate the available bits among the scenes to achieve the maximum overall quality.
Some rate control methods simply use a fixed quantizer for all the different scenes, regardless of the degrees of motion. As described above, these methods generate coded bit stream with coded frame sizes and bit rates varying greatly from scene to scene. Since the fixed quantizer is selected before the encoding, a video sequence with a lot of high speed motion scenes will generate a coded bit stream of very large size, and a video sequence with mostly low speed motion scenes will generate a coded bit stream of relatively small size. Thus, in a method employing a fixed quantizer, there is no effective control over the bit rate.
One approach for controlling the bit rate in a data compression process uses a second order Rate/Distortion model to emulate the property of the video scenes. After the motion compensation of each video frame is performed, a sum of average difference (SAD) value is calculated to measure the residue error after motion compensation. The quantizer value is computed from the SAD value and the statistics of a short history of the last few frames. The image is then coded with the computed quantizer value. This approach is adopted in the Mobile Multimedia Systems (MoMuSys). MoMuSys is the standard reference implementation by ISO during its development of MPEG-4 standard. Although this approach has been shown to work in low bit rate mobile application environment, it does not give satisfactory results for the high bit rate encoding of motion pictures. Another challenge is that in dynamic coding, the coder only knows the past, i.e., the scenes that have already been encoded. The coder does not know the future, the scenes it has not processed as yet.
Accordingly, it would be advantageous to have a data compression and encoding method and system that permit governed responses to scene activity, which is sensitive to the contrast between high action and low action scenes. It is desirable for the method and the system to perform dynamic optimization of bit rate distribution among the frames for best overall quality. It is also desirable for the method and system to employ a long term memory, as compared to reacting only to the immediate level of action in a scene, so as to ensure overall bit rate convergence for the picture as a whole. It is further desirable for method and system to use the data regarding the preceding scenes, the history of any given encoding project, to generate decisions as to the remainder of the scenes, i.e., the future settings. It is would be of further advantage if the method and system are able to react quickly to the low speed motion scenes to achieve superior quality, especially those low speed motion scenes immediately following high motion scenes.