Video is ubiquitous on the Internet. In fact, many people today watch video exclusively online. And, according to the latest statistics, almost 90% of Internet traffic is attributable to video. All of this is possible, in part, due to sophisticated video compression. Video compression, thusly, plays an important role in the modern world's communication infrastructure. By way of illustration, uncompressed video at standard resolution (i.e., 640×480) would require 240 Mbps of bandwidth to transmit. This amount of bandwidth, for just a standard video, exceeds significantly the capacity of today's infrastructure and, for that matter, the widely available infrastructure of the foreseeable future.
Modern video compression techniques take advantage of the fact that information content in video exhibits significant redundancy. Video exhibits temporal redundancy inasmuch as, in a new frame of a video, most content was present previously. Video also exhibits significant spatial redundancy, inasmuch as, in a given frame, pixels have color values similar to their neighbors. The first commercially widespread video coding methods, MPEG1 and MPEG2, took advantage of these forms of redundancy and were able to reduce bandwidth requirements substantially.
For high quality encoding, MPEG1 generally cut from 240 Mbps to 6 Mbps the bandwidth requirement for standard definition resolution. MPEG2 brought the requirement down further to 4 Mbps. MPEG2 is resultantly used for digital television broadcasting all over the world. MPEG1 and MPEG2 each took advantage of temporal redundancy by leveraging block-based motion compensation. To compress using block-based motion compensation, a new frame that is to be encoded by an encoder is broken up into fixed-size, 16×16 pixel blocks, labeled macroblocks. These macroblocks are non-overlapping and form a homogenous tiling of the frame. When encoding, the encoder searches for the best matching macroblock of a previously encoded frame, for each macroblock in a new frame. In fact, in MPEG1 and MPEG2 up to two previously encoded frames can be searched. Once a best match is found, the encoder establishes and transmits a displacement vector, known in this case as a motion vector, referencing and, thereby, approximating, each macroblock.
MPEG1 and MPEG2, as international standards, specified the format of the motion vector coding but left the means of determination of the motion vectors to the designers of the encoder algorithms. Originally, the absolute error between the actual macroblock and its approximation was targeted for minimization in the motion vector search. However, later implementations took into account the cost of encoding the motion vector, too. Although MPEG1 and MPEG2 represented significant advances in video compression, their effectiveness was limited, due, largely, to the fact that real video scenes are not comprised of moving square blocks. Realistically, certain macroblocks in a new frame are not represented well by any macroblocks from a previous frame and have to be encoded without the benefit of temporal redundancy. With MPEG1 and MPEG2, these macroblocks could not be compressed well and contributed disproportionately to overall bitrate.
The newer generation of video compression standards, such as H.264 and Google's VP8, has addressed this temporal redundancy problem by allowing the 16×16 macroblocks to be partitioned into smaller blocks, each of which can be motion compensated separately. The option is to go, potentially, as far down as 4×4 pixel block partitions. The finer partitioning potentially allows for a better match of each partition to a block in a previous frame. However, this approach incurs the cost of coding extra motion vectors. The encoders, operating within standards, have the flexibility to decide how the macroblocks are partitioned and how the motion vectors for each partition are selected. Regardless of path, ultimately, the results are encoded in a standards compliant bitstream that any standards compliant decoder can decode.
Determining precisely how to partition and motion compensate each macroblock is complex, and the original H.264 test model used an approach based on rate-distortion optimization. In rate-distortion optimization, a combined cost function, including both the error for a certain displacement and the coding cost of the corresponding motion vector, is targeted for minimization. To partition a particular macroblock, the total cost-function is analyzed. The total cost function contains the errors from motion compensating each partition and the costs of encoding all the motion vectors associated with the specific partitioning. The cost is given by the following equation:F(ν1, . . . ,νN)=ΣpartitionsErrorpartition+αΣpartitionsR(νpartition)  (1)where α is the Langrange multiplier relating rate and distortion, ΣpartitionsErrorpartition is the cost associated with the mismatch of the source and the target, and
      ∑    partitions    ⁢      R    ⁡          (              v        partitons            )      is the cost associated with encoding the corresponding motion vectors.
For each possible partitioning, the cost function F is minimized as a function of motion vectors v. For the final decision, the optimal cost functions of each potential partitioning are considered, and the partitioning with lowest overall cost function is selected. The macroblocks are encoded in raster scan order, and this choice is made for each macroblock as it is encoded. The previous macroblocks impact the current macroblock by predicting differentially the motion vectors for the current macroblock and, thus, impacting the coding cost of a potential candidate motion vector. This approach is now used de facto in video compression encoders for H.264 and VP8 today.
In an exemplary and non-limited embodiment, aspects of the disclosure are embodied in a method of encoding video including determining objects within a frame at least partially based on movement characteristics of underlying pixels and partitioning the frame into blocks by considering a plurality of partitioning options, such partitioning favoring options that result in different objects being placed in different blocks.
In another example, aspects of the present disclosure are embodied in a partitioner operable to partition a frame into blocks by considering a plurality of partitioning options, such partitioning favoring options that result in different objects being placed in different blocks.
In yet another example, aspect of the present disclosure are embodied in a computer readable media having instructions thereon that when interpreted by a processor cause the processor to determine objects within a frame at least partially based on movement characteristics of underlying pixels; and partition a frame into blocks by considering a plurality of partitioning options, such partitioning favoring options that result in different objects being placed in different blocks.