Video compression methods are used within digital television broadcasting systems to reduce the data rate per channel while maintaining picture quality. It is a primary objective of these compression methods that the instantaneous demand of the moving television picture sequence for transmission capacity is substantially met at all times despite its varying complexity.
Typical transmission channels used to convey audio-visual material have fixed bit rates and so the varying demand of the picture sequence for capacity may not always be satisfied. It is an inevitable result of the compression process that for extremes of highly complex picture behaviour, the picture quality may occasionally be compromised in order that the bit rate criteria are met. By choosing a bit rate that is too low, poor quality will result for a significant proportion of the time. Conversely, a chosen bit rate that is too high will meet quality needs, but will waste transmission capacity for a significant proportion of the time. Thus, some kind of control mechanism is required that evens out the peaks and troughs of demand so that a given fixed bit rate is adequate to deliver good picture quality at all times. Part of such control ideally should take some objective measure of the picture quality into account so that the distortion in the picture is known to some degree. The optimisation of this process is called Rate Distortion Optimisation (RDO) and is an inherent part of practical realisations of modern compression methods.
The complex compression methods currently employed have become very sophisticated and use a variety of techniques in concert to achieve the objective of coding complex picture sequences using minimum bit rate. Typically, in such methods the compressed picture sequence of the television signal is hierarchically structured at a number of levels, each enabling the full set of coding tools available to be applied efficiently.
At the highest of these levels, the picture sequence is organised into contiguous Groups of Pictures (GOP) and each group is further organised so that the first picture of each GOP is coded without reference to any other picture in the sequence. This is known as Intra-picture coding, and the resultant picture is called an I picture. Subsequent pictures in the GOP are coded differentially with respect to other pictures in the GOP including this I picture.
For example the second picture in the GOP is typically predicted directly from the first I picture and the differences between the prediction and the actual picture, typically being small, are then coded with the consequence that the bit rate requirement is reduced. The resultant picture is known as a Predicted or P picture.
The next picture of the GOP may also be predicted in turn from this P picture and this pattern may repeat for the remainder of the GOP. These P predictions are uni-directional and use past pictures to predict future ones in a sequence of mutual dependence. It is also possible to code pictures in the GOP using Bi-directional prediction (i.e. using both past and future pictures) which effectively predictively interpolates the current picture. These pictures are known as B pictures. Thus a typical GOP may have a structure such as IPPBPPB or IBBPBBP, etc, and this structure and the GOP length are arbitrary and set by the system operator to suite the needs of a given application.
In typical video systems, a two dimensional image of a scene is usually scanned in a raster fashion from top left to bottom right in a series of so-called horizontal lines, and then each scan is repeated regularly to produce a sequence. The resolution or sharpness of the picture is determined by the number of picture elements or pixels allocated to the scan. The shape of the picture, its aspect ratio, determines the relationship between the number of horizontal and vertical pixels. In broadcast systems these numbers are standardised.
It is typical of television pictures that their representation takes one of two forms. Either the individual picture scans are completed using only one pass of the image or they can be done in two parts where half the scan is done in a first pass, where only the odd numbered horizontal lines are taken, and the second half is done a second pass where the remaining even numbered lines are taken. The former scan type is called Progressive or Sequential scan, and the latter is called an Interlaced scan.
The first pass of the interlaced scan produces the so-called Top Field and the second pass the Bottom Field. The two fields together cover the same number of pixels as the complete Progressive scan, and the complete picture is called a Frame.
It is clear that any movement in the picture during the Interlace scan will result in a degree of dislocation between the pixels of each Field and that degree of dislocation will be more severe the greater the speed of motion. This dislocation can cause a significant loss of efficiency in the compression of moving pictures and so it is better to code rapidly moving picture sequences Field by Field. All currently used compression methods recognise this and allow both Field and Frame modes to be chosen as the picture behaviour demands.
The ITU-T H.264 (MPEG 4 part 10) compression standard used widely in the most recent commercial video compression products includes among its features the use of GOPs and a Field/Frame mode. In particular the coding of both P and B pictures in the GOP uses Inter-Field or Frame predictive methods. In order to extract the best performance from the standard, it divides each complete picture, either a Frame or a Field, into a large number of contiguous, rectilinear blocks of pixels. The most significant of these blocks is a square group of pixels called a macroblock (MB), which is typically 16×16 luminance pixels.
The predictive coding process operates primarily at MB level and the coding of a given MB in a given picture is performed using a prediction from a block or blocks within another picture or pictures in the GOP used as references and which have already been coded. However, the H.264 Inter prediction standard allows not only whole MBs to be predicted from a number of reference pictures, but it also allows various sub-divisions or Partitions of MBs to be predicted. This added sophistication, compared to older compression standards (such as MPEG-2) contributes to the superior performance of H.264. In the particular case of encoding a B Field/Frame, the reference pictures may be from previous pictures in display order—so called reference list0 pictures—or from later pictures in display order—so called reference list1 pictures.
The Predictive process described above, operating at MB or Partition level, seeks to find blocks of pixels in selected reference pictures that match a given block in the picture currently being coded. Motion search methods are commonly used to identify a number of best match blocks, or candidates, from a set of reference pictures. These candidates can be combined in list0/list1 pairs to produce Bi-predicted candidates.
Furthermore 16×16 pixel MBs and 8×8 pixel Partitions may also be predicted using the so called Direct Mode. Hence there may be several Inter prediction candidates for each MB and each Partition which must be compared to find the best, most efficient coding. This flexibility in the number of choices available improves the performance of the method, but at the expense of the additional processing required to evaluate each of the coding options.
Each assessment must be completed within the duration period of the MB, and the computing power and speed needed to do this are challenging and so an efficient practical method of achieving the required result is extremely valuable. For example, in a high definition encoder working on a 1920×1080 pixel picture format at 60 Hz where a typical Frame period is 33.3 milliseconds there are 120×68=8160 MBs, each MB therefore having to be completely coded in 4 microseconds.
To achieve the most efficient and accurate video encoding, the comparison of the candidates ideally takes into account how high the quality of the output image will be, and also how many bits will be taken to encode the candidate. The Rate-Distortion Optimization (RDO) technique solves this problem by taking into account both a video quality metric, measuring the Distortion as the deviation from the source material, and the bit cost for each possible decision outcome.
Currently known methods of RDO candidate assessment are inefficient in their use of available processing resources, necessitating higher powered processing resources than is strictly required. Higher powered processing resources are more expensive to implement, and require attendant increases in running costs, such as cooling requirements and power usage. Accordingly, the present invention seeks to provide an improved method and apparatus for assessing RDO candidates.