Video encoding of today is becoming more and more efficient and enables video data to be stored on hard drives and sent over communications networks. In the surveillance industry, a lot of video data is generated every day. The video data in surveillance systems is largely comprised of live video data. This live video data needs to be encoded on the fly, in order to be delivered substantially instantaneously and be relevant for the systems and persons depending on the video for swift and correct decisions. Generally, the video data in a surveillance system needs to be encoded in the device capturing the video, e.g. in a video camera, or in a networked encoder arranged close to the camera, in order to keep the network load low. Hence, the devices performing encoding are often devices having limited processing power and/or storage.
Most schemes developed for encoding video are developed for the motion picture industry, where the encoding is not time critical, and therefore the video can be processed over an extended period of time for high compression and high image reproduction quality. Accordingly, many encoding schemes for encoding motion pictures require multiple pass encoding, i.e. the video is encoded consecutively more than once. Such time-consuming encoding is not viable in a surveillance system. In addition, the multiple pass encoding requires more memory and processing cycles than most surveillance cameras are designed for. Thus, the surveillance video systems require other encoding schemes to lower the network load, storage requirement, and processing requirements.
Many of the widely used and most efficient video encoding schemes used today are block based, i.e. the image processing of each image frame in the video is performed on blocks or units of the image. In video encoding schemes such as Motion JPEG, H.261, MPEG-1 Part 2, H.262/MPEG-2 Part 2, and H.264/MPEG 4 AVC the block structure used includes macro blocks, in JPEG they are called MCUs, which are the basic blocks of these schemes. The macro blocks may then be partitioned into macro block partitions.
For H.264 a coded picture consists of a plurality of macro blocks, each containing 16×16 luma samples and associated chrome samples. A macro block may be partitioned in four ways, see FIG. 1. As a 16×16 macro block partition, as two 8×16 macro block partitions, as two 16×8 macro block partitions, or as four 8×8 macro block partitions. The 8×8 partitions may be further partitioned into one 8×8 sub-macroblock partition, two 4×8 sub-macroblock partitions, two 8×4 sub-macroblock partitions, or four 4×4 sub-macroblock partitions. H.264 allows for variable block sizes in a frame. This feature makes it possible to represent video using even less data, since different characteristics of a scene in a video frame are most efficiently coded using different size macroblocks.
In many encoders, a video using even less data is achieved by testing encoding, using all possible combinations of block sizes, within each macroblock, and then comparing the quality between encoding using different block size combinations within each macroblock. If the quality of a macroblock of size 16×16 is only marginally worse than the quality of the smaller sizes, then the macroblock of size 16×16 is selected. However, if there is a relevant difference in quality, an appropriate one of the smaller size macroblocks is selected. The selection of block sizes may be performed using rate distortion (RD) cost calculations for the different sizes.
In video encoding schemes such as H.265/HEVC (High Efficiency Video Coding) the block structure includes coding units (CUs), prediction units (PUs), and transform units (TUs). The CU is the basic unit of region splitting and is used for inter prediction and intra prediction. The CU allows for recursive subdividing into four equally sized blocks and may be split by quadtree structure of four level depths. In other words, a CU, having an original size of 64×64 pixels may be subdivided into blocks of sizes 32×32 pixels, 16×16 pixels, and 8×8 pixels, see FIG. 2.
Each CU is then partitioned into one or more PUs, each of which is predicted using intra or inter prediction. A PU is the basic unit for carrying information related to the prediction processes. An inter CU may have four types of PUs, including N×N partition, N×N/2 partition, N/2×N partition and N/2×N/2 partition, wherein the size of the corresponding CU is N×N. An intra CU may have two types of Pus: N×N partition and N/2×N/2 partition. The TUs may be of any one of the partition sizes 32×32 pixels, 16×16 pixels, 8×8 pixels, 4×4 pixels, depending on the size of the corresponding PU.
Now referring to FIG. 3, in both H.264 and in HEVC/H.265 a possible structure for a coded image may be as follows. An image frame 300 may be partitioned into slices 302 which in turn may be divided into, in view of H.264, macroblocks 304, or, in view of HEVC/H.265, coding tree units (CTUs) 304. An image frame may include a plurality of slices and each slice includes a plurality of macroblocks 304 or CTUs 304, as indicated by the boxes drawn in dashed lines.
In H.264 each macroblock 304 may then be partitioned, as described above, into macroblock partitions 306 and sub macroblock partitions 308. The partitions 310 in the figure are not relevant for H.264. In HEVC/H.265 each CTU 304 may then be partitioned, as described above, into CUs 306, 308, 310, which in turn may include further partitions in the form of PUs and TUs, not shown.
The selection of macroblock partition sizes or sub-macroblock partition sizes was generally described above, and in H.264 the selection described above, often made through exhaustive testing, i.e. all combinations of sizes are encoded, and for each combination a cost is calculated. Then the most appropriate combination of block sizes within each macroblock is selected based on the calculated cost.
In HEVC/H.265 a cost is calculated, according to a similar concept as is described above, for all possible combinations of CU, PU, and TU sizes, for the purpose of selecting the optimal size combination for the various units. The cost may be a rate distortion (RD) cost, which is a cost function describing the trade-off between quality and bitrate. This exhaustive search for optimal sizes results in high computational complexity and will use a non-acceptable high amount of the processing power and memory capacity of a device that has restricted processing power and memory capacity.
Moreover, in live viewing applications, the time for encoding is also important, in addition to keeping the amount of data used to represent a video low. The video must be encoded and delivered with minimal delay and latency, so that the video may arrive at a receiver within a reasonably time limit. To reduce the computational burden of H.265 encoders, there have been suggested a plethora of encoding methods, arranged to reduce the number of CUs and PUs to be tested. Many approaches include checking all zero block, motion homogeneity, RD cost, or tree pruning, to skip motion estimation on unnecessary CU sizes. Other approaches include early TU decision algorithm (ETDA).
Hereinafter, the term “base coding block” will represent features like macroblocks in H.264, CTUs in HEVC/H.265, and corresponding structures in other coding schemes. Further, the term “coding block” will hereinafter represent features like macroblock partitions and sub-macroblock partitions found in H.264 coding schemes and like CUs, PUs, and TUs HEVC/H.265 coding schemes, and corresponding structures in other coding schemes.
From the above we may conclude that many block based encoding schemes, implementing some kind of coding tree structure, waste a lot of processing power, encoding time, and data storage for achieving encoding with high image quality and using few data bits. The reason for this is, as mentioned above, that most encoding schemes solve the problem of getting as high-quality video as possible, and using as few bits as possible, by encoding all combinations of coding block sizes, for each base coding block in an image frame and then evaluate the cost function, which is based on image quality and data usage.
Such encoding schemes may be used for non-time-critical applications, where the encoding may be performed using powerful computers with access to large data storage areas. However, in applications for capturing live video, using a device having limited computational resources, limited access to power, and limited data storage, these encoding schemes are not applicable. The problem has been recognised for HEVC encoders in the research article “An Effective Transform Unit Size Decision Method for High Efficiency Video Coding” by Chou-Chen Wang, Chi-Wei Tung, and Jing-Wein Wang, published in “Mathematical Problems in Engineering”, Volume 2014 (2014), Article ID 718189, http://dx.doi.org/110.1155/2014/718189, from Hindawi Publishing Corporation.