1. Field of the Invention
This invention relates to video decompression. More particularly this invention relates to a video decoder, and to the management of memory being used to store decoded video frame pictures in an image formatter of a video decoder.
2. Description of the Related Art
Various compression standards for video data, i.e., JPEG, MPEG and H.261, are well known from U.S. Pat. No. 5,212,742. An important compression standard is the Moving Picture Experts Group Convention ("MPEG"), and more specifically MPEG-2 (ISO/IEC 13818). Circuitry used in decoders for MPEG-2 encoded video data is disclosed, for example, in European Patent Application No. 92306038.8, which is of common assignee herewith.
MPEG encoding involves three different picture types: Intra ("I"), Predicted ("P") and bidirectionally interpolated ("B"). B pictures are based on predictions from two pictures, one picture from the future and one from the past. I pictures require no further decoding by the Temporal Decoder, but must be stored in one of the two picture buffers for later use in decoding P and B pictures. The picture order is modified at the encoder so that I and P picture can be decoded from the coded date before they are required to decode B pictures. Decoding P pictures requires forming predictions from a previously decoded P or I picture. The decoded P picture is stored in picture buffers for use decoding P and B pictures.
B pictures can require predictions from the picture buffers. As with P pictures, half pixel motion vector resolution accuracy requires on chip interpolation of the picture information. B pictures are not stored in the buffers. They are merely transient.
In MPEG decoding a temporal and a spatial decoder are typically provided. The Spatial Decoder employed in the present invention performs all the required processing within a single picture, which reduces the redundancy within one picture. The Temporal Decoder reduces the redundancy between the subject picture and a picture which arrives prior to the arrival of the subject picture, as well as a picture which arrives after the arrival of the subject picture.
FIG. 1 illustrates how an I picture 2 is stored in a picture buffer 4, and then output. FIG. 2 shows how a P picture 6 is formed from a picture buffer 8, stored in a second picture buffer 10, and then output. FIG. 3 illustrates how a B picture 12 is constructed from information in two picture buffers 14, then output without being stored.
I and P pictures are usually not output from the temporal decoder as they are decoded. Instead, I and P pictures are written into one of the picture buffers, and are read out only when a subsequent I or P picture arrives for decoding. In other words, the temporal decoder relies on subsequent P or I pictures to flush previous pictures out of the two picture buffers. The spatial decoder can provide a dummy I or P picture at the end of a video sequence to flush out the last P or I picture. In turn, this dummy picture is flushed when a subsequent video sequence starts.
Peak memory bandwidth load occurs when decoding B pictures. In an example taken from the "worst case" scenario, the B frame may be formed from predictions from two picture buffers with all predictions being to half pixel accuracy. Table 1 presents performance data using a typical dynamic random access memory ("DRAM").
TABLE 1 ______________________________________ read or form prediction form prediction Data bus width write 8 .times. 8 (half (integer pixel (bits) block pixel accuracy) accuracy) ______________________________________ 8 3657 ns 4907 ns 3963 ns 16 1880 ns 2907 ns 2185 ns 32 991 ns 1907 ns 1741 ns ______________________________________
From the data in Table 1, it can be seen that it will take the decoder's DRAM interface 3815 ns to read the data required for two accurate half pixel accurate predictions (via a 32 bit wide interface). The resolution that the Temporal Decoder can support is determined by the number of these predictions that can be performed within one picture time. In this example, the Temporal Decoder can process 8737 8.times.8 blocks in a single 33 ms picture period (e.g., for 30 Hz video).
If the required video format is 704.times.480, then each picture contains 7920 8.times.8 blocks (taking into consideration the 4:2:0 chroma sampling). It can be seen that this video format consumes approximately 91% of the available DRAM interface bandwidth (before any other factors such as DRAM refresh are taken into consideration). Accordingly, the Temporal Decoder can support this video format.
When MPEG picture re-ordering is employed the worst case scenario is encountered while P pictures are being decoded. During this time, there are 3 loads on the DRAM interface: (1) form predictions; (2) writing back the result; and (3) reading out the previous P or I picture.
Using the data from Table 1, the time for each of these tasks can be determined when a 32 bit wide interface is available. Forming the prediction takes 1907 ns/n while the read and the write each take 991 ns, a total of 3889 ns. This permits the Temporal Decoder to process 8485 8.times.8 blocks in a 33 ms period. Hence, processing 704.times.480 video will use approximately 93% of the available memory bandwidth (ignoring refresh).
A block diagram of a conventional decoder system 16 is presented in FIG. 4. Currently it is common to employ a synchronous DRAM as the DRAM 18 which is used in the video formatter 20. The spatial decoder 22, and the temporal decoder 24 utilize DRAMs 26, 28 respectively. During the MPEG decoding process up to three frame stores may be required to be stored in the DRAM 18. The DRAM interface 30 is particularly important in achieving acceptable performance. In the well known National Television System Committee ("NTSC") convention, this requirement amounts to 4 megabits/frame, for a total of 12 megabits. For the Phase Alternation Line ("PAL") convention, the frame size is approximately 5 megabits, so that 15 megabits of memory is needed in the DRAM 18. Commercial decoder systems have implemented the DRAM 18 as a 16 megabit random access memory ("RAM"), for reason of ready availability. However in the worst case, only 1 megabit of RAM remains for other processing functions of the video formatter 20, which is insufficient. Provision of an adequate amount of memory results in operation in a "4.3 frame store mode". Hence it has been necessary to provide another RAM (not shown), usually 4 megabits in size to accommodate the video formatter 20. The 4 megabit memory is larger than necessary, but is utilized because, as is the case of the 16 megabit RAM, it is an off-the-shelf component. In very large scare integrated circuit ("VLSI") realizations of an MPEG decoder, it is desirable to generally reduce the amount of memory for reasons of cost, power consumption, and space utilization.
The video formatter 20 processes data from the spatial decoder 22 and the temporal decoder 24. A digital video frame is treated as a grid of picture elements, or pixels. The pixels are grouped into 8.times.8 blocks, and the blocks are further grouped into 2.times.2 units, known as macroblocks. Thus a macroblock represents a grouping of 16.times.16 pixels, or a grouping of 2.times.2 blocks. A PAL picture constitutes 45.times.36 macroblocks, and an NTSC picture is 45.times.30 macroblocks. Referring to FIG. 5, each macroblock 32 comprises four luminance blocks 34 and two chrominance blocks 36, and contains the information for an original 16.times.16 grouping of pixels. Each of the four luminance blocks 34 and two chrominance blocks 36 is 8.times.8 pixels in size. The four luminance blocks 34 contain a 1 pixel to 1 pixel mapping of the luminance (Y) information from the original 16.times.16grouping of pixels. One chrominance block 36 contains a representation of the chrominance level of the blue color signal (Cu/b), and the other chrominance block 36 contains a representation of the chrominance level of the red color signal (Cv/r). Each chrominance level is subsampled such that each 8.times.8 chrominance block 36 contains the chrominance level of its color signal for the entire original 16.times.16 block of pixels.
More recently it has become possible to compress one of the noted three frame stores (the "B frame store"). When this is done the decoder is said to operate in "2.5 frame store mode". This is desirable because in the case of an NTSC signal, only 10 megabits of memory is required in the DRAM 18, and in the case of PAL, 12.5 megabits. A practical import is the ability to decode PAL pictures in a single 16 megabit memory. However memory management in the 2.5 frame store mode has presented considerable difficulties, because the MPEG algorithm may require the video formatter 20 to process an extensively intermingled sequence of I, P, and B pictures. Each type of picture undergoes distinct processing. Furthermore if the process of decoding a subsequent picture is delayed, it may be necessary to redisplay one or more fields of a current picture which places further demands on the decoder's memory management.