This invention relates to the field of decoding motion compensated non-intra coded compressed image data. An example of such data is non-intra coded portions of an MPEG data stream.
Conventional hardware based MPEG decoders operate on a block-by-block basis to decode the data stream. More particularly, for a non-intra coded block, predictive coding is used whereby reference needs to be made to previous image data (within a reference picture) together with the compressed data to reconstruct the block presently being decoded (within a generated picture). Typically, an area of picture data in the reference picture that matches (within limits) the block being decoded was identified during the compression process and can be referenced using a motion vector pointing to the area of previously decoded data during decompression. Once the present block has been decompressed, the hardware can start decompressing the next block.
Whilst the above described techniques yield a high degree of compression, a problem that arises is the disadvantageously frequent need to make memory accesses to reference picture data as part of the decompression process. In modem memory systems, such as SDRAM, there is a relatively high degree of latency associated with each new burst mode memory access. For example, it may take seven memory clock cycles to recover the first data word in a burst with each remaining data word then being returned in every further memory clock cycle. Accordingly, a memory access to five data words would take eleven memory clock cycles (7+4). This represents an efficiency of less than 50% relative to the peak bandwidth of the memory system. Measures that can improve the efficiency of operation of memory access within such decoding systems are advantageous as they reduce the time taken to perform the decoding and release memory bandwidth that can be usefully employed elsewhere.
Viewed from one aspect the present invention provides apparatus for decoding blocks of motion compensated non-intra coded compressed image data, said apparatus comprising:
a memory for storing previously decoded image data;
a decoding processor responsive to a motion vector of a block being decoded for fetching previously decoded image data from said memory for use in decoding said block; wherein
decoding of a motion vector for a block being decoded takes place before a fetch is made for decoding of a preceding block; and
said decoding processor concatenates fetches for at least one line of previously decoded data for different blocks being decoded into burst mode fetches.
The invention recognizes that whilst the decoding may take place on a block-by-block basis, the fetching of previously decoded image data need not be broken down into such a block-by-block process. Furthermore, the invention recognizes that in many cases there will be a strong correlation between the previously decoded image data fetched for the preceding block and the previously decoded image data fetched for the current block. In these circumstances it is possible to concatenate at least one of the memory fetches (which may be a burst for each line of each area in the reference picture) thereby greatly increasing the efficiency of use of the memory access channels.
Whilst memory fetches might be concatenated only in the circumstances where they exactly abutted, a net overall gain can be made even when there are spaces between the memory fetches provided these spaces are not so large as to negate the avoidance of an additional memory latency cycle. The fetches can also be combined when they overlap to even greater advantage since duplicated fetches are eliminated, or when they are in reversed order. Accordingly, in preferred embodiments of the invention said processor concatenates fetches to memory addresses within a predetermined range of each other.
It will be appreciated that the block being decoded could have an individual motion vector and be completely independent of all other blocks. However, improved compression of the source data can be achieved when the individual blocks are processed as parts of a macroblock sharing a common motion vector or motion vectors (e.g. as in MPEG data). In this case the block of data being decoded could be a macroblock or a section of a macroblock composed of several smaller blocks.
A convenient way for determining whether fetches can be concatenated in preferred embodiments is one in which if said motion vectors decoded for successive blocks are within a predetermined range of one another then said fetches are concatenated.
Whilst the invention can be usefully employed in many different types of memory system, it is particularly useful when the memory is a memory having a first access time for a first access in a burst and a subsequent access time for each subsequent accesses within said burst, said first access time being greater than said subsequent access time. SDRAM memory is a common example of such a memory which has a high latency for the first access and yet is highly efficient for subsequent accesses within a burst. Accordingly, decoder implementations employing this type of memory particularly benefit from the use of the present invention.
Whilst the invention could be embodied purely in hardware, the invention is particularly suitable for systems in which software decoding of the compressed image data occurs. Software decoding generally allows a greater degree of flexibility in the ordering of the operations to be performed and so allows motion vector identification, comparison and fetching to be performed for a subsequent block before the preceding block is finally dealt with. Software embodiments also make the dynamic alteration of the processing parameters (e.g. the range over which fetches are concatenated) easier to achieve. For example, software can be made to automatically adjust itself to the surrounding hardware environment.
The image data that is decompressed could have many different formats. However, the invention is particularly useful when the image data is compressed HDTV image data. Such HDTV image data typically contains a high number of blocks sharing very similar motion vectors for which the fetches can be concatenated.
The invention is also well suited to systems in which the reference picture data is accessed at reduced resolution to produce the generated picture. Examples of this are producing a standard resolution (SDTV) picture from HDTV data or a PIP (picture-in-picture) scaled-down display from full screen resolution (SDTV or HDTV) data. In such reduced resolution memory accesses, the bursts scaled-down for an individual block are shorter and less efficient making the invention more useful.
The splitting of the various tasks to be performed in the decomposition of a block of compressed data may be efficiently performed in preferred embodiments in which a data stream parsing processor parses said compressed image data to extract parsed data including a required fetch and other data representing each block and transfers said parsed data to said decode processor which decompresses each block. The parsing processor and the decoding processor could be the same hardware at different stages of the operation.
A pre-fetch buffer between the main memory and the decode processor may be used to further improve the efficiency of operation of the system.
Viewed from another aspect the present invention provides a method of decoding blocks of motion compensated non-intra coded compressed image data, said method comprising the steps of:
storing previously decoded image data in a memory;
in response to a motion vector of a block being decoded, fetching previously decoded image data from said memory for use in decoding said block; wherein
decoding of a motion vector for a block being decoded takes place before a fetch is made for decoding of a preceding block; and
fetches for at least one line of previously decoded data for different blocks being decoded are concatenated into burst mode fetches.