The present invention relates to digital video signal compression and, in particular, to a particularly efficient signal encoding mechanism for encoding digital video signals according to digital video standards such as the ITU standard H.263.
With the advent of digital video products and services, such as Digital Satellite Service (DSS) and storage and retrieval of video streams on the Internet and, in particular, the World Wide Web, digital video signals are becoming ever present and drawing more attention in the marketplace. Because of limitations in digital signal storage capacity and in network and broadcast bandwidth limitations, compression of digital video signals has become paramount to digital video storage and transmission. As a result, many standards for compression and encoding of digital video signals have been promulgated. For example, the International Telecommunication Union (ITU) has promulgated the H.261 and H.263 standards for digital video encoding. Additionally, the International Standards Organization (ISO) has promulgated the Motion Picture Experts Group (MPEG), MPEG-1, and MPEG-2 standards for digital video encoding.
These standards specify with particularity the form of encoded digital video signals and how such signals are to be decoded for presentation to a viewer. However, significant discretion is left as to how the digital video signals are to be transformed from a native, uncompressed format to the specified encoded format. As a result, many different digital video signal encoders currently exist and many approaches are used to encode digital video signals with varying degrees of compression achieved.
In general, greater degrees of compression are achieved at the expense of video image signal loss and higher quality motion video signals are achieved at the expense of lesser degrees of compression and thus at the expense of greater bandwidth requirements. It is particularly difficult to balance image quality with available bandwidth when delivery bandwidth is limited. Such is the case in real-time motion video signal delivery such as video telephone applications and motion video on demand delivery systems. It is generally desirable to maximize the quality of the motion video signal as encoded without exceeding the available bandwidth of the transmission medium carrying the encoded motion video signal. If the available bandwidth is exceeded, some or all of the sequence of video images are lost and, therefore, so is the integrity of the motion video signal. If an encoded motion video signal errs on the side of conserving transmission medium bandwidth, the quality of the motion video image can be compromised significantly.
The format of H.263 encoded digital video signals is known and is described more completely in xe2x80x9cITU-T H.263: Line Transmission of Non-Telephone Signals, Video Coding for Low Bitrate Communicationxe2x80x9d (hereinafter xe2x80x9cITU-T Recommendation H.263xe2x80x9d). Briefly, in H.263 and other encoded video signal standards, a digital motion video image signal, which is sometimes called a video stream, is organized hierarchically into groups of pictures which include one or more frames, each of which represents a single image of a sequence of images of the video stream. Each frame includes a number of macroblocks which define respective portions of the video image of the frame. An I-frame is encoded independently of all other frames and therefore represents an image of the sequence of images of the video stream without reference to other frames. P-frames are motion-compensated frames and are therefore encoded in a manner which is dependent upon other frames. Specifically, a P-frame is a predictively motion-compensated frame and depends only upon one I-frame or, alternatively, another P-frame which precedes the P-frame in the sequence of frames of the video image. The H.263 standard also describes BP-frames; however, for the purposes of description herein, a BP-frame is treated as a P-frame.
All frames are compressed by reducing redundancy of image data within a single frame. Motion-compensated frames are further compressed by reducing redundancy of image data within a sequence of frames. Since a motion video signal includes a sequence of images which differ from one another only incrementally, significant compression can be realized by encoding a number of frames as motion-compensated frames, i.e., as P-frames. However, errors from noise introduced into the motion video signal or artifacts from encoding of the motion video signal can be perpetuated from one P-frame to the next and therefore persist as a rather annoying artifact of the rendered motion video image. It is therefore desirable to periodically send an I-frame to eliminate any such errors or artifacts. Conversely, I-frames require many times more bandwidth, e.g., on the order of ten times more bandwidth, than P-frames, so encoding I-frames too frequently consumes more bandwidth than necessary. Accordingly, determining when to include an I-frame, rather than a P-frame, in an encoded video stream is an important consideration when maximizing video image quality without exceeding available bandwidth.
Another important consideration when maximizing video image quality within limited signal bandwidth is the compromise between image quality of and bandwidth consumed by the encoded video signal as represented by an encoding parameter xcex. In encoding a video signal, a particular value of encoding parameter xcex is selected as a representation of a specific compromise. between image detail and the degree of compression achieved. In general, a greater degree of compression is achieved by sacrificing image detail, and image detail is enhanced by sacrificing the degree of achievable compression of the video signal. In the encoding standard H.263, a quantization parameter Q effects such a comprise between image quality and consumed bandwidth by controlling a quantization step size during quantization in an encoding process.
However, a particular value of encoding parameter xcex which is appropriate for one motion video signal can be entirely inappropriate for a different motion video signal. For example, motion video signals representing a video image which changes only slightly over time, such as a news broadcast (generally referred to as xe2x80x9ctalking headsxe2x80x9d), can be represented by relatively small P-frames since successive frames differ relatively little. As a result, each frame can include greater detail at the expense of less compression of each frame. Conversely, motion video signals representing a video image which changes significantly over time, such as fast motion sporting events, require larger P-frames since successive frames differ considerably. Accordingly, each frame requires greater compression at the expense of image detail.
Determining an optimum value of encoding parameter xcex for a particular motion video signal can be particularly difficult. Such is especially true for some motion video signals which include both periods of little motion and periods of significant motion. For example, in a motion video signal representing a football game includes periods where both teams are stationary awaiting the snap of the football from the center to the quarterback and periods of sudden extreme motion. Selecting a value of encoding parameter xcex which is too high results in sufficient compression that frames are not lost during high motion periods but also in unnecessarily poor image quality during periods were players are stationary or moving slowly between plays. Conversely, selecting a value of encoding parameter xcex which is too low results in better image quality during periods of low motion but likely results in loss of frames due to exceeded available bandwidth during high motion periods.
A third factor in selecting a balance between motion video image quality and conserving available bandwidth is the frame rate of the motion video signal. A higher frame rate, i.e., more frames per second, provides an appearance of smoother motion and a higher quality video image. At the same time, sending more frames in a given period of time consumes more of the available bandwidth. Conversely, a lower frame rate, i.e., fewer frames per second, consumes less of the available bandwidth but provides a motion video signal which is more difficult for the viewer to perceive as motion between frames and, below some threshold, the motion video image is perceived as a xe2x80x9cslide show,xe2x80x9d i.e., a sequence of discrete, still, photographic images. However, intermittent loss of frames resulting from exceeding the available threshold as a result of using an excessively high frame rate provides a xe2x80x9cjerkyxe2x80x9d motion video image which is more annoying to viewers than a regular, albeit low, frame rate.
I-frame placement and encoding parameter xcex value selection combine to represent a compromise between motion video image quality and conservation of available bandwidth. However, to date, conventional motion video encoders have failed to provide satisfactory motion video image quality within the available bandwidth.
In accordance with the present invention, a primary open loop rate control selects an optimized encoding parameter xcex by determining a desired size for an individual frame and comparing the size of the frame as encoded to the desired size. Encoding parameter xcex represents a compromise between the distortion introduced into a motion video signal as a result of encoding the motion video signal and the amount of data required to represent the motion video signal as encoded and therefore the amount of bandwidth consumed in delivering the encoded motion video signal. A specific value of encoding parameter xcex represents a specific compromise between image quality and consumed bandwidth. Encoding a motion video signal in accordance with encoding parameter xcex effects the compromise between consumed bandwidth and video image quality represented by encoding parameter xcex. If the encoded frame size is greater than the desired size, encoding parameter xcex is increased to reduce the size of subsequently encoded frames to consume less bandwidth at the expense of image quality. Conversely, if the encoded frame size is less than the desired size, encoding parameter xcex is reduced to increase the size of subsequently encoded frames to improve image quality and to fully consume available bandwidth. As a result, each frame is encoded in a manner which maximizes image quality while approaching full consumption of available bandwidth and guarding against exceeding available bandwidth.
Further in accordance with the present invention, a secondary close loop rate control ensures that overall available bandwidth is never exceeded. Encoding parameter xcex is selected by accumulating a cumulative bandwidth error which represents the amount by which bandwidth consumed by encoding a motion video signal deviates from the amount of bandwidth which is available for encoding of the motion video signal. The cumulative bandwidth error accumulates as time passes and is consumed by encoded frames which are transmitted through the communication medium whose bandwidth is measured. Encoding frames which are consistently slightly too large results in incremental reductions in the cumulative bandwidth error which can have a negative value and which can grow in magnitude as a result of such reductions. In response to the reduction of the cumulative bandwidth error, encoding parameter xcex is increased to reduce the size of subsequently encoded frames to consume less bandwidth at the expense of image quality. Encoding frames which are consistently slightly too small results in a incremental increases in the cumulative bandwidth error. In response to the increases in the cumulative bandwidth error, encoding parameter xcex is decreased to increase the size of subsequently encoded frames to improve image quality and to fully consume available bandwidth. As a result, gradual trends of the primary open loop rate control which allow available bandwidth to accumulate or to be exceeded are thwarted. In addition, secondary closed loop rate control contributes to selecting an optimum compromise between image quality and available bandwidth.
Further in accordance with the present invention, motion video images which change from a slow changing scene to a rapidly changing scene are detected and encoding parameter xcex is adjusted to more quickly adapt to the changing motion video signal and to continue to provide a particularly desirable compromise between image quality and available bandwidth. In particular, the absolute pixel difference between two consecutive frames is measured. Previously measured absolute pixel differences corresponding to previously encoded frames of the motion video signal are filtered to form a filtered previous absolute pixel difference. Encoding parameter xcex is adjusted in accordance with the absolute pixel difference and the filtered previous absolute pixel difference independently of changes to encoding parameter xcex as determined by the primary open loop rate control and secondary closed loop rate control described above. In particular, if the current absolute pixel difference is greater than the filtered previous absolute pixel difference, showing an increase in the rate of change between frames, encoding parameter xcex is increased to reduce the size of subsequently encoded frames and to thereby make additional bandwidth available for such encoded frames. Conversely, if the current absolute pixel difference is less than the filtered previous absolute pixel difference, a decrease in the rate of change between frames is detected and encoding parameter xcex is decreased to improve image quality and to more fully consume available bandwidth. As a result, the optimum compromise achieved by the primary open loop rate control and the secondary closed loop rate control is more stable, i.e., reaches equilibrium more quickly, when the rate of change between frames of a motion video image changes significantly and rapidly.
Further in accordance with the present invention, a scene change between frames of a motion video signal are detected and the first frame of the new scene is encoded as an I-frame. As a result, the encoded frame is only slightly larger than an equivalent P-frame since a scene change represents a particularly large change between the current frame and the previous frame. In addition, the encoding of the next I-frame is postponed until the expiration of a full I-frame interval which starts with the encoding of the scene change I-frame, even if the previous I-frame interval had partially elapsed but had not expired prior to encoding of the I-frame. A scene change is detected by measuring the absolute pixel difference between the current frame and the previous frame, filtering the absolute pixel difference with a previously filtered absolute pixel difference, and comparing the newly filtered absolute pixel difference to a threshold. The threshold is proportional to the previously filtered absolute pixel difference. If the newly filtered absolute pixel difference is greater than the threshold, the current frame is determined to be the first frame of a new scene and is therefore encoded as an I-frame.
Each of these mechanisms represents a significant improvement over the prior art and enhances the quality of a motion video image without exceeding available bandwidth. These mechanisms can be used individually or in combination.