A. Technical Field
The present invention relates generally to the encoding of data for transmission along a communications link, and more particularly, to the selection of modes in which data is encoded prior to transmission.
B. Background of the Invention
The burden of high bandwidth applications, such as voice and video, on networks is continually increasing. To facilitate these bandwidth hungry applications, compression technology and standards are evolving to allow these applications to be more effectively communicated across a network to a client. Two such standards are H.264 and MPEG-4 which relate to the encoding of video signals. Although these standards generally improve the method in which data is encoded, they may also place a significant strain on processing resources on the encoder itself. Because of the time-sensitive nature of transmitting and receiving video data, an encoder has a limited amount of time to select an appropriate encoding method for a video frame, encode the video frame, and transmit the frame onto a network. The quality of the video signal may be jeopardized if the encoder is unable to complete all of the necessary encoding computations, within the requisite time, that are required to encode and transmit the video signal.
FIG. 1 illustrates a typical communications link 120 on which an encoded video signal may be communicated. As illustrated, a video camera 110 generates a video signal which is sent to an encoder 115. This encoder 115 may be software located on a computer or server that is connected to the communications link 120. The encoder 115 receives a video frame which may be divided in macroblocks. These macroblocks may be further divided into luma components (e.g., 4×4 blocks and 16×16 blocks) and chroma components (e.g., 8×8 blocks). These blocks may be encoded in either an inter or intra mode that is selected by the encoder 115. Intra mode encoding means that encoding occurs relative to data within the same video frame. Inter mode encoding means that encoding occurs relative to one or more reference frames outside the current video frame. After the blocks are encoded they are transmitted, via the communications link 120, to a receive-side decoder 125. The decoder 125 reconstructs the video signal and provides it to the display device 130.
FIG. 2 illustrates a typical method in which a video signal may be encoded. A video frame 205 is transformed 210 using a discrete cosine transformation (‘DCT’) into a set of spatial frequency coefficients 212; this DCT is analogous to a transformation from a time domain signal into a frequency domain signal. The frequency coefficients are then quantized 215 resulting in a scaled signal 220. In effect, the quantization process divides the frequency coefficients by an integer scaling factor and thereafter truncating the signal. This process of transforming and quantizing the frame 205 introduces error, such as lost data, into the video signal.
The amount of error introduced into the video signal by the encoding processing may be determined by reconstructing the encoded frame. Reconstruction occurs by reverse quantizing 225 the video signal, which results in a rescaled signal 230. This rescaled signal 230 is then inversely transformed 235 by an inverse discrete cosine transform to produce a reconstructed frame 245. This reconstructed frame 245 may be compared to the original video frame 205 to identify the error introduced by the encoding process. The video frame 205 may be encoded in one of multiple different prediction modes, each mode typically having a different error level than the other modes.
Each macroblock may be coded in one of several coding modes depending on the slice-coding type: seven different block modes for motion-compensation in the inter-mode, and various spatial directional prediction modes in the intra-modes. In all slice-ceding types, two classes of intra coding modes are supported, which are denoted as Intra4×4 and Intra16×16 in the following. When using the Intra4×4 mode, each 4×4 block of the luminance component utilizes one of nine prediction modes. When using the Intra16×16 mode, four prediction modes are supported. The chrominance samples of a macroblock are always predicted using a unique DC prediction regardless of what intra-coding mode is used for luminance and intra prediction across slice boundaries is not allowed.
FIG. 3A shows three exemplary prediction mode diagrams for a 4×4 luma video block according to the H.264 standard. These standards define a total of nine different prediction modes, mode 0 through mode 8, in which a 4×4 luma block may be encoded. Mode 0 305 is a vertical mode in which pixel data is extrapolated from upper samples within the block. Mode 1 310 is a horizontal mode in which pixel data is extrapolated from left samples within the block. Mode 8 315 is a horizontal-up mode in which pixel data is extrapolated from left and lower samples within the block. Modes 2 through 7 are not shown, but detailed description of all 4×4 luma block modes are available in the H.264 standard.
FIG. 3B shows three exemplary mode diagrams for a 16×16 luma video block or 8×8 chroma video block according to the H.264 standard. This standard defines a total of four different prediction modes, mode 0 through mode 3, in which either a 16×16 luma or 8×8 chroma block may be encoded. Mode 0 330 is a vertical mode in which pixel data is extrapolated from upper samples within the block. Mode 1 335 is a horizontal mode in which pixel data is extrapolated from left samples within the block. Mode 3 340 is a “plane” mode in which a linear plane function is fitted to the upper and left-hand samples within the block. Mode 2 is not shown, but detailed descriptions of all 16×16 luma and 8×8 chroma block modes are available in the H.264 standard.
The selection of a prediction mode for a particular block may require significant processing resources because of the time-sensitive characteristics of a video signal. In particular, an encoder has a limited amount of time to select a prediction mode, encode a block according to the prediction mode, and transmit the block onto a network. If the selection of a prediction mode requires a large number of processor computations, this may create processing difficulties on the encoder to timely encode and transmit the video signal.
The selection of a prediction mode may include rate-distortion computations for each of the potential prediction modes. An analysis of each mode's rate-distortion value allows for the selection of an optimal prediction mode for a particular block. However, these rate-distortion computations may be processor intensive and place a burden on the encoder to timely encode the video signal. The rate-distortion value is defines as:J(s,c,m|QP,λm)=SSD(s,c,m|QP)+λm*R(s,c,m|QP),
where QP is the macroblock quantization parameter, λm is the Lagrange multiplier for mode decisions, SSD is the sum of the squared differences between the original block and a reconstructed block, and R represents the number of bits associated with the mode.
The complexity of the rate-distortion computation, and the number of times the computation is performed, directly affects the time and resources required to identify a prediction mode for a block. Depending on the encoder, and the system in which the encoder operates, these computations may overload the encoder resulting in a degradation of the encoded video signal that it generates.
Accordingly it is desirable to provide a device and method that addresses the above-described problems.