1. Field of the Invention
The present invention relates to a method and an apparatus suitable for use with differential encoding and decoding techniques, and in particular to such a method and apparatus suitable for use with video compression encoding and decoding techniques.
2. Discussion of Prior Art
As is known in the art, differential encoding involves comparing portions of data with one another and using information relating to the differences between the portions of data rather than the entire data portions themselves to represent the “original” data. This has the advantage that a smaller volume of data is required to encode a given amount of original data, which can be important where, for example, the data transmission capacity is restricted.
Such differential encoding techniques are particularly suitable for the compression of (digital) video data, because although there may be 25 to 30 video frames per second, within a given scene in a video sequence, each frame will typically be very similar to the adjacent frames, with the differences only often being due to “objects” in the frames moving to different positions. This means that much of the video data necessary to reproduce successive frames in a video sequence is substantially identical as between frames.
The MPEG video compression standards and other related algorithms, for example, therefore use differential encoding to compress video data, e.g. for transmission or storage purposes.
Generally, in differential encoded video data each video frame is divided into a plurality of blocks (16×16 pixel blocks in the case of MPEG encoding) and each block of the frame is encoded individually. Three types of data “block” are usually used (e.g. stored or transmitted). These are commonly referred to as INTRA (I) blocks, INTER (P) blocks and bi-directionally predicted (B) blocks.
INTRA (I) blocks are coded frame blocks which contain no predicted or differenced data, i.e. are complete data blocks which are not dependent on any previous (or future) frame blocks. INTER (P) blocks and bi-directionally predicted (B) blocks are differentially coded frame blocks that describe the differences between the “current” block and a “prediction” frame block created from video data in frames before the current frame, and, in the case of B blocks, also video data in frames generated after the current frame. The “prediction” frame block that the differences encoded in P and B blocks are referenced to could, for example, simply comprise a preceding I (i.e. complete) frame block, or could be a more complex frame block predicted, e.g., from an I block and one or more preceding P blocks.
As in such arrangements P and B blocks only contain data relating to differences between blocks in frames in the original video data, they are considerably smaller than I blocks, and so the overall amount of data that must be transmitted or stored can be reduced by using P and/or B blocks to encode the data. (However, complete, i.e. I, blocks must still be stored or transmitted at intervals to allow the complete original data to be reconstructed.)
As is known in the art, an important aspect of such differential encoding of video data is identifying which areas of the video frames being compared are most similar to each other (such that there is then a reduced or minimum number of differences to be encoded). This process is complicated by the fact that, typically, the area of the “prediction” (reference) frame that most closely matches a given block or area in the current frame will not be in the same position within the reference frame as that area is in the current frame. This is because the most closely matching areas in the video frames will tend to move between frames, as objects in the video sequence move around.
Differential encoding of video data typically therefore involves two aspects: firstly identifying the location in a “reference” video frame of the area in that frame that most closely matches the area (block) of the video frame currently being encoded, and then determining the differences between the two areas in the two frames (i.e. the current and the reference frame).
The encoded data accordingly usually comprises a vector value pointing to the area of a given reference frame to be used to construct the appropriate area (block) of the frame currently being constructed, and data describing the differences between the two areas. This thereby allows the video data for the area of the frame currently being constructed to be constructed from video data describing the area in the reference frame pointed to by the vector value and the difference data describing the differences between that area and the area of the video frame currently being constructed.
The process of identifying which areas in different video frames most (or sufficiently) closely match and accordingly determining the vector to be stored to point to the relevant area in the reference video frame is usually referred to as “motion estimation”. This process is usually carried out by comparing video data values (usually luminance values) for each pixel in a given area or block (typically a 16×16 pixel block in MPEG systems) of the video frame currently being encoded with a succession of corresponding-sized pixel blocks in the reference video frame until the closest (or a sufficiently close) match in terms of the relevant video data values is found. The vector pointing to the so-identified pixel block in the reference frame is then recorded and used for the encoded data stream. The relative closeness or match between relevant video data for the pixel blocks being compared is assessed using difference comparison or cost functions, such as a mean-squared difference (MSD) function.
However, because they require a comparison between a large number of pixel video data values (e.g. 256 pixel values where 16×16 pixel blocks are being tested), such “motion estimation” processes are computationally intensive, even if the range of the search over the reference frame (i.e. the region of the reference frame over which the search for the closest matching frame area is carried out) is deliberately limited. This can be disadvantageous generally, but particularly is so where the processing power of the encoding system may be limited. This could, e.g., particularly be the case where it is desired to encode “real time” video data using, e.g., a mobile device that may accordingly have limited processing capacity.
The Applicants have recognised that it is becoming increasingly common to include in microprocessor based devices, including mobile devices, some form of 3D graphics processor, i.e. a device that is designed specifically for carrying out the operations necessary to process and display three-dimensional graphics. (The 3D graphics processor will, as is known in the art, typically act as a slave of the main “host”, general microprocessor of the device and be used to carry out 3D graphics processing operations so that the general microprocessor of the device does not have to.)
As is known in the art, 3D graphics processing operations are usually carried out on (i.e. using) discrete graphical entities usually referred to as “fragments”. Each such fragment will usually correspond to a single pixel (picture element) in the final display (since as the pixels are the singularities in the final picture to be displayed, there will usually be a one-to-one mapping between the “fragments” the 3D graphics processor operates on and the pixels in the display). However, it can be the case that there is not a direct correspondence between “fragments” and “pixels”, where, for example, particular forms of post-processing such as down-scaling are carried out on the rendered image prior to displaying the final image.
Thus, two aspects of 3D graphics processing that are typically carried out on a 3D graphics processor are the “rasterising” of graphics “primitive” (or polygon) position data to graphics fragment position data (i.e. determining the (x, y) positions of the graphics fragments to be used to represent each primitive in the scene to be displayed), and then “rendering” the “rasterised” fragments (i.e. colouring, shading, etc., the fragments) for display on a display screen.
(In 3D graphics literature, the term “rasterisation” is sometimes used to mean both primitive conversion to fragments and rendering. However, herein “rasterisation” will be used to refer to converting primitive data to fragment addresses only.)
The rendering process basically involves deriving a colour value for each graphics fragment to be displayed and typically is carried out in a pipelined process (the so-called “rendering pipeline”).
The rendering process (e.g. pipeline) typically receives as an input sets of graphics fragments in the form of two-dimensional arrays representing primitives to be displayed. For each fragment in the array, data necessary to display the fragment is then determined. Such data typically comprises red, green and blue (RGB) colour values for each fragment (which will basically determine the colour of the fragment on the display), and a so-called “Alpha” (transparency) value for each fragment. These RGB and alpha data values are usually referred to as being stored in RGB and alpha data channels of each graphics fragment (i.e. such that each graphics fragment has four data channels in which data values for that fragment can be stored).
In the rendering process, the individual fragments of the array (i.e. in practice their associated fragment data, e.g. RGB and alpha values) pass down the rendering pipeline one after another. As each fragment passes down the pipeline, it is firstly allocated initial RGB and alpha values, based on, e.g., colour and transparency data recorded for the vertices of the primitive to which the fragment belongs. Operations such as texturing, fogging, and blending, etc., are then carried out on the fragment data as it passes down the rendering pipeline. These operations modify the initial RGB and alpha values set for each fragment, such that each fragment emerges from the pipeline with an appropriate set of RGB and alpha values to allow that fragment to be displayed correctly on the display screen.
As each fragment emerges from the rendering pipeline it is stored (i.e. its final RGB and alpha values are stored) ready for display of the fragment on the display screen. This process is repeated for all the fragments in the scene area currently being rendered.
It is also the case, as is known in the art, that in 3D graphics rendering processes, it is possible (and indeed common) for a new fragment provided to the rendering pipeline to have the same fragment (e.g. pixel) position in the display as a fragment that has already passed down the pipeline (and is, e.g., stored at the end of the pipeline ready for display). When such a new fragment reaches the end of the graphics pipeline, there will then be two fragments, each having their own data (e.g. RGB and alpha) values, one at the end of the pipeline and one stored for display, having the same fragment (pixel) position. This conflict is usually resolved in 3D graphics processing operations by, e.g., rejecting one of the two fragments based on the relative depth of the fragments in the scene to be displayed.
However, the Applicants have recognised that this aspect of 3D graphics processor rendering pipelines provides a facility for comparing data relating to two fragments having the same position in a given two-dimensional array of fragments, since, in effect, the rendering pipeline can be arranged to provide at its end two sets of fragment data for the same fragment position. That data could, accordingly, if desired, be compared. Furthermore, the fragment data generation is carried out for two-dimensional arrays of fragments (e.g. corresponding to a 3D graphics primitive to be displayed).
The Applicants have recognised that accordingly, and as will be explained further below, 3D graphics rendering pipelines handle two dimensional arrays of graphic fragments in a manner that allows two different sets of fragment data for a given position in the array to be compared (e.g. by sending a first fragment for a given position in the array down the rendering pipeline such that the data for that fragment is stored at the end of the rendering pipeline and then sending a second fragment for that fragment position down the rendering pipeline such that a new set of fragment data for that fragment position is generated by the rendering pipeline). The Applicants have further recognised that this means that a 3D graphics rendering pipeline treats fragment data in a manner that is compatible with the processes required for “motion estimation” in differential encoding and video compression techniques (since such processes basically entail comparing data on a pixel-by-pixel basis for two-dimensional areas in different video frames).
The Applicants have accordingly recognised that because a 3D graphics rendering pipeline carries out many of these “motion estimation” relevant functions in hardware, it provides the facility to hardware accelerate the “motion estimation” process (i.e. to allow the computationally intensive motion estimation operations to be carried out in hardware on the 3D graphics processor, rather than having to be carried out (e.g. in software) on a more general microprocessor or CPU (central processing unit)).
Thus, the Applicants have recognised that in a 3D-graphics enabled microprocessor system, the 3D graphics processor could be used to carry out “motion estimation” processes, thereby reducing the computational burden on the general microprocessor, e.g., CPU, of the system. This could also allow, for example, a mobile or less powerful device that is equipped with a 3D graphics processor still to carry out motion estimation and accordingly video compression and differential encoding in situations where the general microprocessor or CPU of the device may not in itself be able to do so. That could allow, for example, real-time video encoding and streaming by mobile devices that may not otherwise be able to carry out such functions. The present invention also removes the need to provide an additional dedicated hardware device for motion estimation acceleration where the system already includes a 3D graphics processor.