In transforming an analogue signal into a digital signal, the analogue signal is sampled at an adequate sampling rate. The magnitude of each sample, or the value of a predefined function of each sample, is approximated by one of a number of discrete levels, often referenced as quantization levels. The larger the number of quantization levels, or equivalently the smaller a quantization step, the more accurate the digital representation. A video signal transformed into digital format is further organized into a succession of sets of samples, where each set of samples, often called a frame, may be displayed as an image using a display device. Each image is defined by a number of “picture elements” (pixels), often called display resolution.
Naturally the higher the pixels' spatial density, the closer the image to the original picture it represents. A displayed image persists until replaced by a succeeding image. Thus, the higher the rate of changes of displayed images, the higher the image rate, also called frame rate. Each video frame is compressed by exploiting spatiotemporal redundancies. In the encoding process, spatiotemporal predictions are performed to reduce redundancies and differences from these predictions (also called residual information) are transformed (often using a discrete cosine transform or a similar transform), quantized, entropy coded, and transmitted. As in the quantization of analogue signals, the quantization of transformed residual information affects the fidelity (visual quality) as well as the bit rate of the encoded signal. A smaller quantization step leads to a better signal fidelity and to a higher bit rate. The three above parameters (display resolution, frame rate and quantization (or quantization step)) affect the flow rate (bit rate) or file size, as well as the fidelity, of a video sequence. The higher the display resolution, the higher the flow rate (or size) and fidelity. The lower the quantization step, the higher the flow rate (or size) and fidelity. There are several methods of encoding video signals which aim at reducing the size of an encoded video recording and/or the flow rate of an encoded video signal.
An encoder of a video signal may encode the signal according to a specific quantization step, a specific display resolution, and a specific frame rate compatible with a target receiving node. Alternatively, an encoder of a video signal may encode the signal according to a nominal quantization step, a nominal display resolution, and a nominal frame rate to be further transcoded into a different quantization step, a different display resolution, and/or a different frame rate.
Transcoding may be necessitated by the capability of a receiving node, the capacity of a communication path to a receiving node, or both. Several originating nodes, each having a respective encoder, may direct initially encoded video signals to a shared signal adaptation node to be individually re-encoded (transcoded) and directed to respective receiving nodes.
Encoding a video signal, or transcoding an already encoded video signal, for delivery to a target receiving node sometimes requires initial processes of acquiring characteristics of the receiving node to determine an upper bound of display resolution, and an upper bound of frame rate. It is also desirable to determine properties of the video signals, such as a classification according to rate of temporal image variation which may influence the selection of the encoding parameters. Classification of a video signal may be based on a representative rate of temporal variation, a quantifier of spectral content in terms of bandwidth occupied by the signal, or some indicator of scene variation rate. In other cases, transcoding may be performed between an old format to a more recent one with better compression capabilities, regardless of the characteristics of the receiving node, to reduce the size of an encoded video recording and/or the flow rate of an encoded video signal.
In H.264, the basic processing unit is the macroblock (MB) and represents a block of 16×16 samples. Each MB has a prediction mode (intra, inter or skip). An intra MB supports 2 partition modes: 16×16 and 4×4 and these modes support respectively 4 and 9 spatial prediction modes. An inter MB must be partitioned into 16×16, 16×8, 8×16 or 8×8 blocks. An 8×8 block can be sub-partitioned in 8×4, 4×8 or 4×4 blocks. Each inter block has its own motion vector (MV). The skip MB is a special case of an inter MB encoded with the predicted MV and without residual data.
In HEVC, the basic processing unit is the coding tree unit (CTU), which has a maximum block size of 64×64 pixels and is represented by a quadtree structure. Each node of the quadtree is associated with a coding unit (CU) denoted Ci,j, where j is the jth CU at depth level i. The quadtree maximum depth is 4 and the CU minimum size is 8×8. When a CU Ci,j is split, its children correspond to sub-CUs Ci+1, 4j+k, with k=0 . . . 3.