State of the art video coders use adaptive block partitioning in order to best classify visual data for efficient coding. For this purpose most coding algorithms divide each frame on a uniform grid of blocks of a given size (macroblocks), and then, depending on the video data, they further divide each block into smaller partitions in order to best adapt to video data. An inherent limitation of this approach is that, in state of the art standards (such as the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/international Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”), the size of the blocks from the initial grid is fixed and is independent on the type of content being encoded. Depending on the content, the video resolution and/or the desired compression, the initial grid prefixed block size, may be appropriate. However, and in most of the cases, there is a great variety of situations where this initial grid prefixed block-size will limit the maximum possible achievable efficiency. Indeed, signal structures (e.g., areas with a similar or equal motion) bigger than the initial grid block-size cannot be identified and jointly encoded. Typically, such use of bigger areas for coding may be of interest when coding high resolution content at low and/or middle rates. Indeed, one would like to find the best distortion-coding cost optimized compression method for areas as big as possible, in order to reduce possible signal redundancy as much as possible. As a possible solution, one may decide to simply set a bigger size for the initial block used to initialize the tree frame partitioning when needed. Hence, as previously proposed in the case of the MPEG-4 AVC Standard, this would be equivalent to eventually doubling the dimensions of block modes. However, this implies that the smallest blocks size is lost, and consequently, adaptation capacity for very small detail is reduced. Another possibility could be, in addition to doubling the dimensions of block modes (hereinafter the “first case”), to also increase the depth of the coding tree (hereinafter the “second case”). In any of these two cases, a major problem arises, in that coding and decoding architectures need to follow major transformations in order to adapt them to the new initial block size. Hence, this calls for a full re-design of the encoding/decoding system when different families of block sizes are desired. In the second case, in addition, smaller size block coding modes will probably be penalized in terms of information cost, having a negative impact in low resolution sequences that cannot profit from the biggest introduced partitions. Indeed, in a “general purpose” encoder used for a large range of resolutions, one is interested in being able to use additional frame partition types when necessary and not to consider them when unnecessary in order to spare bits. This requires a flexible way of enabling or disabling certain frame partition sizes.
The use of different block partition sizes has been progressively introduced into video coding approaches in order to achieve efficient compression. At first, video standards and/or recommendations, for example, previous to the International Telecommunication Union, Telecommunication Sector (ITU-T) H.263 recommendation (hereinafter the “H.263 Recommendation”), mostly used single size-based frame partitioning (i.e., most typically with block size of 16×16). Adaptive frame partitioning was introduced within the H.263 Recommendation family in order to increase compression efficiency (8×8 blocks could be also considered apart from 16×16 ones). Partitioning was introduced based on a tree structured set of partitions. The use of adaptive, tree-based, frame partitioning was consolidated in the MPEG-4 AVC Standard by means of a large set of possible partitions: 16×16; 16×8; 8×16; 8×8; 8×4; 4×8; and 4×4. In some occasions, there is the need of using bigger partitions than 16×16 in order to “pack” and code information in a more efficient way. One method of doing this is addressed by a first prior art approach, where depending on the need, a reduced resolution partitioning of frames is eventually used by doubling the size of all possible partitions. For example, according to the first prior art approach, all 16×16 and 8×8 modes in the MPEG4 AVC Standard would be modified such that they work as 32×32 and 16×16 modes, respectively. This approach has two primary problems. The first problem is that encoder and decoder implementations typically need to be redesigned to cope with such a structural change. The second problem is that a loss in partition resolution is produced.
A more general way of generating arbitrary shape partitions out of an initial tree-based partitioning, such as the one produced by the MPEG-4 AVC Standard, is addressed in a second prior art approach. In accordance with the second prior art approach, additional syntax data is sent for every block and sub-block in order to indicate whether that block is jointly coded with a neighbor or coded individually (eventually, the neighbor selected for joint coding is indicated as well). This approach, even if it is very flexible, has the following main drawbacks/disadvantages. On such drawback/disadvantage is that the second prior art approach tries to generate arbitrarily shaped regions by means of block merging. Hence, additional data needs to be transmitted for every one of the blocks or sub-blocks within the frame having at least one neighbor to merge with. This makes the signaling complicated and, in some applications, such an amount of partition possibilities can be simply over-whelming. This also introduces unnecessary overhead. Another such drawback/disadvantage is that the second prior art approach loses the hierarchical structure of partitions after merging and does not handle “super-macroblock-like” partitions. Yet another drawback/disadvantage is that the second prior art approach needs to code each macroblock type mode as it does not impose a hierarchical structure of partitions.
Direct prediction modes can be seen as a way of extending the use of motion information from a single block, into bigger regions, as if the blocks involving the bigger region where coded together. However, motion information is not optimized considering the whole region at the encoder side. Moreover, shape and structure of directly predicted regions based on Direct Prediction modes is uncontrolled. Indeed, relationships between different blocks, or macroblocks, depends on the typically used motion median predictor and do not necessarily keep a hierarchical structure.
Turning to FIG. 1, a video encoder capable of performing video encoding in accordance with the MPEG-4 AVC standard is indicated generally by the reference numeral 100.
The video encoder 100 includes a frame ordering buffer 110 having an output in signal communication with a non-inverting input of a combiner 185. An output of the combiner 185 is connected in signal communication with a first input of a transformer and quantizer 125. An output of the transformer and quantizer 125 is connected in signal communication with a first input of an entropy coder 145 and a first input of an inverse transformer and inverse quantizer 150. An output of the entropy coder 145 is connected in signal communication with a first non-inverting input of a combiner 190. An output of the combiner 190 is connected in signal communication with a first input of an output buffer 135.
A first output of an encoder controller 105 is connected in signal communication with a second input of the frame ordering buffer 110, a second input of the inverse transformer and inverse quantizer 150, an input of a picture-type decision module 115, an input of a macroblock-type (MB-type) decision module 120, a second input of an intra prediction module 160, a second input of a deblocking filter 165, a first input of a motion compensator 170, a first input of a motion estimator 175, and a second input of a reference picture buffer 180.
A second output of the encoder controller 105 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 130, a second input of the transformer and quantizer 125, a second input of the entropy coder 145, a second input of the output buffer 135, and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140.
A first output of the picture-type decision module 115 is connected in signal communication with a third input of a frame ordering buffer 110. A second output of the picture-type decision module 115 is connected in signal communication with a second input of a macroblock-type decision module 120.
An output of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140 is connected in signal communication with a third non-inverting input of the combiner 190.
An output of the inverse quantizer and inverse transformer 150 is connected in signal communication with a first non-inverting input of a combiner 119. An output of the combiner 119 is connected in signal communication with a first input of the intra prediction module 160 and a first input of the deblocking filter 165. An output of the deblocking filter 165 is connected in signal communication with a first input of a reference picture buffer 180. An output of the reference picture buffer 180 is connected in signal communication with a second input of the motion estimator 175. A first output of the motion estimator 175 is connected in signal communication with a second input of the motion compensator 170. A second output of the motion estimator 175 is connected in signal communication with a third input of the entropy coder 145.
An output of the motion compensator 170 is connected in signal communication with a first input of a switch 197. An output of the intra prediction module 160 is connected in signal communication with a second input of the switch 197. An output of the macroblock-type decision module 120 is connected in signal communication with a third input of the switch 197. The third input of the switch 197 determines whether or not the “data” input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 170 or the intra prediction module 160. The output of the switch 197 is connected in signal communication with a second non-inverting input of the combiner 119 and with an inverting input of the combiner 185.
Inputs of the frame ordering buffer 110 and the encoder controller 105 are available as input of the encoder 100, for receiving an input picture 101. Moreover, an input of the Supplemental Enhancement Information (SEI) inserter 130 is available as an input of the encoder 100, for receiving metadata. An output of the output buffer 135 is available as an output of the encoder 100, for outputting a bitstream.
Turning to FIG. 2, a video decoder capable of performing video decoding in accordance with the MPEG-4 AVC standard is indicated generally by the reference numeral 200.
The video decoder 200 includes an input buffer 210 having an output connected in signal communication with a first input of the entropy decoder 245. A first output of the entropy decoder 245 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 250. An output of the inverse transformer and inverse quantizer 250 is connected in signal communication with a second non-inverting input of a combiner 225. An output of the combiner 225 is connected in signal communication with a second input of a deblocking filter 265 and a first input of an intra prediction module 260. A second output of the deblocking filter 265 is connected in signal communication with a first input of a reference picture buffer 280. An output of the reference picture buffer 280 is connected in signal communication with a second input of a motion compensator 270.
A second output of the entropy decoder 245 is connected in signal communication with a third input of the motion compensator 270 and a first input of the deblocking filter 265. A third output of the entropy decoder 245 is connected in signal communication with an input of a decoder controller 205. A first output of the decoder controller 205 is connected in signal communication with a second input of the entropy decoder 245. A second output of the decoder controller 205 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 250. A third output of the decoder controller 205 is connected in signal communication with a third input of the deblocking filter 265. A fourth output of the decoder controller 205 is connected in signal communication with a second input of the intra prediction module 260, with a first input of the motion compensator 270, and with a second input of the reference picture buffer 280.
An output of the motion compensator 270 is connected in signal communication with a first input of a switch 297. An output of the intra prediction module 260 is connected in signal communication with a second input of the switch 297. An output of the switch 297 is connected in signal communication with a first non-inverting input of the combiner 225.
An input of the input buffer 210 is available as an input of the decoder 200, for receiving an input bitstream. A first output of the deblocking filter 265 is available as an output of the decoder 200, for outputting an output picture.