Video compression methods are used within digital television broadcasting systems to reduce the data rate per channel while maintaining picture quality. It is a primary objective of these compression methods that the instantaneous demand of the moving television picture sequence for transmission capacity is substantially met at all times despite its varying complexity.
Typical transmission channels used to convey audio-visual material have fixed bit rates and so the varying demand of the picture sequence for capacity may not always be satisfied. It is an inevitable result of the compression process that for extremes of highly complex picture behaviour, the picture quality may occasionally be compromised in order that the bit rate criteria are met. By choosing a bit rate that is too low, poor quality will result for a significant proportion of the time. Conversely, a chosen bit rate that is too high will meet quality needs, but will waste transmission capacity for a significant proportion of the time. Thus, some kind of control mechanism is required that evens out the peaks and troughs of demand so that a given fixed bit rate is adequate to deliver good picture quality at all times. Part of such control ideally should take some objective measure of the picture quality into account so that the distortion in the picture is known to some degree. The optimisation of this process is called Rate Distortion Optimisation (RDO) and is an inherent part of practical realisations of modern compression methods.
The complex compression methods currently employed have become very sophisticated and use a variety of techniques in concert to achieve the objective of coding complex picture sequences using minimum bit rate. Typically, in such methods the compressed picture sequence of the television signal is hierarchically structured at a number of levels, each enabling the full set of coding tools available to be applied efficiently.
At the highest of these levels, the picture sequence is organised into contiguous Groups of Pictures (GOP) and each group is further organised so that the first picture of each GOP is coded without reference to any other picture in the sequence. This is known as Intra-picture coding, and the resultant picture is called an I picture. Subsequent pictures in the GOP are coded differentially with respect to other pictures in the GOP including this I picture.
For example the second picture in the GOP is typically predicted directly from the first I picture and the differences between the prediction and the actual picture, typically being small, are then coded with the consequence that the bit rate requirement is reduced. The resultant picture is known as a Predicted or P picture.
The next picture of the GOP may also be predicted in turn from this P picture and this pattern may repeat for the remainder of the GOP. These P predictions are uni-directional and use past pictures to predict future ones in a sequence of mutual dependence. It is also possible to code pictures in the GOP using Bi-directional prediction (i.e. using both past and future pictures) which effectively predictively interpolates the current picture. These pictures are known as B pictures. Thus a typical GOP may have a structure such as IPPBPPB or IBBPBBP, etc, and this structure and the GOP length are arbitrary and set by the system operator to suite the needs of a given application.
In typical digital video systems, the two dimensional image of a scene that forms each picture in the moving sequence is resolved to a rectilinear array of picture elements, or pixels, each holding the value of the image intensity (luminance) and colour (chrominance) at a given point in the picture. This array is usually scanned in a raster fashion from top left to bottom right in a series of so-called horizontal lines, and then each scan is repeated regularly to produce a sequence. The resolution or sharpness of the picture is determined by the number of pixels allocated to the scan. The shape of the picture, its aspect ratio, determines the relationship between the number of horizontal and vertical pixels. In broadcast systems these numbers are standardised.
It is typical of television pictures that their representation takes one of two forms. Either the individual picture scans are completed using only one pass of the image or they can be done in two parts. The former scan type is called Progressive or Sequential scan, and the latter is called an Interlaced scan where half the scan is done in a first pass, where only the odd numbered horizontal lines are taken, and the second half is done a second pass where the remaining even numbered lines are taken. The first pass of the interlaced scan produces the so-called Top Field and the second pass the Bottom Field. The two fields together cover the same number of pixels as the complete Progressive scan, and the complete picture is called a Frame.
It is clear that any movement in the picture during the Interlace scan will result in a degree of dislocation between the pixels of each Field and that degree of dislocation will be more severe the greater the speed of motion. This dislocation can cause a significant loss of efficiency in the compression of moving pictures and so it is better to code rapidly moving picture sequences Field by Field. All currently used compression methods recognise this and allow both Field and Frame modes to be chosen as the picture behaviour demands.
The ITU-T H.264 (MPEG 4 part 10) compression standard used widely in the most recent commercial video compression products includes among its features the use of GOPs and a Field/Frame mode. In particular the coding of both P and B pictures in the GOP uses Inter-Field or Frame predictive methods. In order to extract the best performance from the standard, it divides each complete picture, either a Frame or a Field, into a large number of contiguous, rectilinear blocks of pixels. The most significant of these blocks is a square group of pixels called a macroblock (MB), which is typically 16×16 luminance pixels.
The encoding of each macroblock must be completed entirely within its duration period. This period depends on the television standard to which the image sequence conforms, since the television standard defines the number of macroblocks per picture, and the Frame/picture rate which defines the absolute time in which the processing needs to be completed. MPEG-4/H.264 in particular provides a significant number of options for coding each MB, each of which, in principle, requires evaluation before a final optimum choice is reached. The computing power and speed needed to do this are particularly challenging for high performance encoding equipment, and so an efficient practical method of achieving the required result is extremely valuable.
It is always possible to design a video encoder to fit given processing resources, but this may involve the incomplete implementation of some coding modes/options or even the complete absence of assessing some options. Such a design will be sub-optimal in performance, but may meet certain other criteria such as cost, power consumption and compactness. It may also be adequate for low or standard resolution applications (i.e. television formats with fewer than say 700 horizontal pixels) and where high picture quality is not the major requirement.
Nevertheless, it is always desirable provide a design that will contribute to improved performance within the prevailing constraints.
For example, in a high definition encoder working on a 1920×1080 pixel progressive picture format at a Frame rate of 60 Hz (the standard known as 1080p60), where a typical Frame period is 16.6 milliseconds, there are 120×68=8160 MBs in each Frame. Therefore, each MB is allocated 2 microseconds within which all the coding options need to be explored fully and a decision made on the best set of options to select for each individual MB (i.e. coding mode).
To achieve the most efficient and accurate video encoding, the comparison of the coding option candidates ideally takes into account how high the quality of the output image will be, and also how many bits will be taken to encode each candidate. The Rate-Distortion Optimization (RDO) technique solves this problem by taking into account both a video quality metric (by measuring the Distortion, which is the deviation of the coded from the source material), and the bit cost for each possible decision outcome.
Currently known methods of RDO candidate assessment are inefficient in their use of available processing resources, necessitating higher powered processing resources than is strictly required. Higher powered processing resources are more expensive to implement for a practical commercial hardware encoder, since they require attendant increases in running costs, such as power consumption and its contingent cooling requirement.
In the MPEG-4/H.264 standard, several coding modes are available, where predictions are made from sets of previously coded pixels, some in the same picture as the current MB (the Intra Mode), and others made by reference to neighbouring blocks in neighbouring reference pictures (the Inter Mode predictions).
In the Intra mode, each MB of 16×16 pixels may be sub-divided into smaller blocks of pixels. One of these sub divisions is the 4×4 sub-block, of which there are 16 per MB, which are shown as 4×4 blocks 0 to 15 in FIGS. 1 and 2. Another common sub division is the 8×8 block (each of four 4×4 sub-partitions), shown as items A, B, C and D in FIG. 1.
Also shown in FIG. 1 is the set of adjacent reference pixels used to predict each pixel of each 4×4 block. The predictions for the pixels of each one of the 4×4 blocks must be completed and evaluated before a choice is made for the coding of the parent MB based on the behaviour of all the possible MB partitions.
The assessment of the complete set of coding options demands that the processing resources makes available from a main memory, just at the right moment, all the relevant reference block pixel values as well as those of the current block. These pixel values include both luminance and chrominance values. The processing of the Intra 4×4 options is done within the same hardware resources as the other partitions of the MBs, but, if done in an inefficient manner, can consume an excessive proportion of those limited resources, thus constraining the performance of the overall coding process.