The present invention relates to the decoding of video bit-streams, particularly although not exclusively encoded according to International Standard ISO/IEC 13818-2 (commonly referred to as MPEG-2 video).
In accordance with customary terminology in the video art, the term xe2x80x9cframexe2x80x9d as used herein consists of two fields, which fields are interlaced together to provide an image, as with conventional analog television. The term xe2x80x9cpicturexe2x80x9d is intended to mean a set of data in a bit-stream for representing an image. A video encoder may choose to code a frame as a single frame picture in which case there is a single picture transmitted consisting of two interlaced fields, or as two separate field pictures for subsequent interlacing, in which case two consecutive pictures are transmitted by the encoder. In a frame picture the two fields are interleaved with one another on a line-by-line basis.
Pels (xe2x80x9cPicture Elementsxe2x80x9d) usually consist of an 8 bit (sometimes 10 bit) number representing the intensity of a given component of the image at the specific point in the image where that pel occurs. In a picture (field-picture or frame-picture), the pels are grouped into blocks, each block having 64 pels organised as 8 rows by 8 columns. Six such blocks are grouped together to form a xe2x80x9cmacroblockxe2x80x9d. Four of these represent a 16 by 16 area of the luminance signal. The remaining two represent the same physical area of the image but are the two colour difference signals (sampled at half the linear resolution as the luminance). Within a picture the macroblocks are processed in the same order as words are read on the page i.e. starting at the top-left and progressing left-to-right before going to the next row (of macroblocks) down, which is again processed in left-to-right order. This continues until the bottom-right macroblock in the picture is reached.
MPEG video is composed of a number of different types of pictures, or, more properly, frames, denoted as
(a) I-frames (Intra Frames) which are compressed using intraframe coding and do not reference any other frames in the coded stream;
(b) P-frames (Predicted Frames) which are coded using motion-compensated prediction from past I-frames or P-frames; and
(c) B-frames (Bidirectionally Predicted Frames) which provide a high degree of compression and are coded using motion-compensated prediction from either past and/or future I-frames or P-frames.
The present invention is particularly concerned with the decoding of B-frames, and for the purposes of this specification the I-frames and P-frames may be viewed as equivalent to one another and will be referred to herein collectively as xe2x80x9canchor framesxe2x80x9d. According to the MPEG-2 standard, it is necessary to maintain two decoded anchor frames, which are used to form predictions when decoding B-frames.
Referring to FIG. 1, which is a block diagram of a prior art arrangement for decoding B-frames, coded video data is input to a channel buffer 2 which feeds the data to a video decoder device 4 having a forward anchor 6 and a backward anchor 8 stored in a memory device 10. The video decoder provides in a memory region 12 of memory 10 a decoded version of a B-frame, region 12 being accessed by a display to provide an output on line 14 to a display. Typically, display of the first field of a decoded B-frame commences a little longer than a field time (half a frame time) after it has started to be placed in the frame store 12 by the video decoder. As a result of three images being stored in memory 12, it is necessary to provide a large quantity of memory, commonly implemented as DRAM or SDRAM connected externally to the video decoder.
A prior improvement to this scheme reduces the requirement for the third frame store to a requirement for an amount of storage a little larger than that required to hold a field of video (half a frame store). This is often referred to as a 2.5 frame store operation.
EP-A-0732857 discloses an arrangement for reducing the amount of memory required as compared with the arrangement of FIG. 1 wherein the third frame store is eliminated and replaced by a block-to-raster buffer so that as a B-frame is decoded by the decoder, it is fed to the buffer and written to the display screen as soon as a certain number of lines of the frame have been stored in the buffer. EP-A-0732857 is particularly concerned with decoding a single frame picture consisting of two interlaced fields. Each B-frame is decoded twice during the display of the image, on a first occasion while a first field of the image is displayed and on a second occasion while the second field to be interlaced is directly displayed. The decoder processes the images in macroblocks, and a converter circuit receives the image data, and supplies lines of the same field to the display. The problem with the arrangement disclosed in EP-A-0732857 is that it does not disclose a system which is able to cope with all possibilities in the form of an encoded frame, whether as described above, a single frame picture or two consecutive field pictures.
It is an object of the invention to provide a video decoder for decoding MPEG-2 pictures which is sufficiently versatile to cope with all possibilities of encoded frame, and which will provide a decoding capability in a efficient memory conserving manner.
In one aspect, the present invention provides a video decoder for decoding encoded video pictures, including memory means for storing a plurality of anchor frames, the decoder employing such anchor frames for decoding intermediate frames, and including buffer means for holding intermediate frame data for display, characterised in that the decoder is operable in first and second modes of operation,
wherein in a first mode of operation a picture is encoded as a single frame and the video decoder decodes the frame twice wherein in a first decoding a set of lines of a first field are provided to the buffer means for display, whereas in a second decoding a set of lines from a second field are provided to the buffer means for display; and
wherein in a second mode of operation in which two consecutive field pictures of a frame are decoded, a first field picture is decoded and provided to the buffer means for display, and then a second field picture is decoded and provided to the buffer means for display.
In a further aspect, the present invention provides a method of decoding encoded video pictures, comprising:
storing a plurality of anchor frames in memory means, employing such anchor frames for decoding intermediate frames, and holding intermediate frame data for display in buffer means;
characterised by first and second alternative modes, wherein a first mode comprises:
providing a picture as a single frame and decoding the frame a first time and providing a set of lines of a first field to the buffer means for display, and decoding the frame a second time and providing a set of lines of a second field to the buffer means for display; and
wherein a second mode comprises providing two consecutive field pictures, and decoding a first field picture and providing the picture to the buffer means for display, and decoding a second field picture and providing the picture to the buffer means for display.
The configuration of the buffer means will usually vary, according to the mode of operation. Thus, in the second mode of operation, the buffer simply has to reconstruct the data from the incoming macroblocks (where the data is encoded according to the MPEG-2 standard) and display the reconstructed data. The buffer means may therefore be configured as two separate 16 line buffers, the first a reconstruction buffer which receives macroblocks decoded, and a second buffer or display buffer for displaying data when transferred from the reconstruction buffer. Whilst this arrangement has the advantage of simplicity, a disadvantage is the large size of buffer required. An alternative and preferred technique is therefore to configure the buffer so that only 8 lines are required. In this arrangement, in said second mode of operation, a row of macroblocks are decoded for a single field picture. Whilst all of the 16 line macroblock belong to the current field, nevertheless half of the lines are discarded, for example those in the lower half of the block. Once the row of macroblocks has been constructed in the buffer to provide 8 lines for display, the decoder returns to the start of the macroblock row and decodes them again, and this time the upper 8 lines are discarded and the lower 8 lines are transferred to the block to raster buffer. Thus, the buffer provides data to the display 8 lines at a time. A principal advantage of this 8 line method is that the amount of storage required for the block-to-raster buffer means is reduced to one half of that required by the 16 line method.
In the first mode of operation for a frame picture, the picture is received as a single frame of interlaced data, and it is necessary that the video decoder decodes the entire frame twice in order to display both fields of the pictures. During the first decoding of the frame the lines of one field are displayed and the other lines of the other field are discarded and in the second decoding the lines of the second field are displayed, the remaining lines being discarded. As will become clear from below, the buffer means is configured to provide eight line reconstruction and display buffers.
In order to reduce the size of the block-to-raster buffer still further a pointer table method is used. This recognizes that the buffers described above are on average half empty during use. In this arrangement when a macroblock is decoded, the data is placed in any available location in the buffer, but a table is kept as a pointer to the various memory locations.
Methods for reducing memory buffer size are known; see for example U.S. Pat. No. 5,151,976, wherein saw tooth data is stored in memory as M stripes of N pixels. In order to avoid first and second memories in which data is alternately read and written, with consequent large memory requirements, data is read and written from the same memory section, wherein the memory is organised according to an addressing scheme wherein a memory location Ai,j is determined by Ai+1,j=(Ai,j+xj) Modulo (MN-1), xj+1=N.xj Modulo (MN-1). However this method is not appropriate where the size of the memory or buffer does not match the length of the stripes.
In contrast the present invention provides in a further aspect a video decoder for decoding encoded video pictures, including memory means for storing a plurality of anchor frames, the decoder employing such anchor frames for decoding intermediate frames, and including buffer means for holding intermediate frame data for display, characterised in that the buffer means includes a pointer table with means for distributing incoming data to any available memory location in the buffer, the address of the memory location being stored in the pointer table.