Recent times have seen an acceleration in efforts by suppliers of consumer electronics to greatly expand the amount and quality of information provided to users. The expanded use of multimedia information in communications and entertainment systems along with user demands for higher quality and faster presentations of the information has driven the communications and entertainment industries to seek systems for communicating and presenting information with higher densities of useful information. These demands have stimulated the development and expansion of digital techniques to code and format signals to carry the information.
Unlike most of the communication systems of the past, particularly television broadcast systems and other systems used for home entertainment, where analog signals have filled available bandwidths with single program real time signals in a straight forward format that includes much redundant information as well as much humanly imperceivable information, digital transmission systems possess the ability to combine and identify multiple programs and to selectively filter out redundant or otherwise useless information to provide capabilities for the transmission of programs having higher quality or having higher useful information carrying ability or density. As a result of the high technological demand for such capabilities, advances toward the specification and development of digital communications formats and systems have accelerated.
In furtherance of these advances, the industry sponsored Motion Pictures Expert Group (MPEG) chartered by the International Organization for Standardization (ISO) has specified a format for digital video and two channel stereo audio signals that has come to be known as MPEG-1, and, more formally, as ISO-11172. MPEG-1 specifies formats for representing data inputs to digital decoders, or the syntax for data bitstreams that will carry programs in digital formats that decoders can reliably decode. In practice, the MPEG-1 standards have been used for recorded programs that are usually read by software systems. The program signals include digital data of various programs or program components with their digitized data streams multiplexed together by parsing them in the time domain into the program bitstreams. The programs include audio and video frames of data and other information.
An enhanced standard, known colloquially as MPEG-2 and more formally as ISO-13818, has more recently been agreed upon by the ISO MPEG. This enhanced standard has grown out of needs for specifying data formats for broadcast and other higher noise applications, such as high definition television (HDTV), where the programs are more likely to be transmitted than recorded and more likely to be decoded by hardware than by software.
The MPEG standards define structure for multiplexing and synchronizing coded digital and audio data, for decoding, for example, by digital television receivers and for random access play of recorded programs. The defined structure provides syntax for the parsing and synchronizing of the multiplexed stream in such applications and for identifying, decoding and timing the information in the bitstreams.
The MPEG video standard specifies a bitstream syntax designed to improve information density and coding efficiency by methods that remove spacial and temporal redundancies. For example, the transformation blocks of 8.times.8 luminance pels (pixels) and corresponding chrominance data using Discrete Cosine Transform (DCT) coding is contemplated to remove spacial redundancies, while motion compensated prediction is contemplated to remove temporal redundancies. For video, MPEG contemplates Intra (I) frames, Predictive (P) frames and Bidirectionally Predictive (B) frames. The I-frames are independently coded and are the least efficiently coded of the three frame types. P-frames are coded more efficiently than are I-frames and are coded relative to the previously coded I- or P frame. B-frames are coded the most efficiently of the three frame types and are coded relative to both the previous and the next I- or P-frames. The coding order of the frames in an MPEG program is not necessarily the same as the presentation order of the frames. Headers in the bitstream provide information to be used by decoders to properly decode the time and sequence of the frames for the presentation of a moving picture.
The video bitstreams in MPEG systems include a Video Sequence Header containing picture size and aspect ratio data, bit rate limits and other global parameters. Following the Video Sequence Header are coded groups-of-pictures (GOPs). Each GOP usually includes only one I-picture and a variable number of P- and B-pictures. Each GOP also includes a GOP header that contains presentation delay requirements and other data relevant to the entire GOP. Each picture in the GOP includes a picture header that contains picture type and display order data and other information relevant to the picture within the picture group.
Each MPEG picture is divided into a plurality of Macroblocks (MBs), not all of which need be transmitted. Each MB is made up of 16.times.16 luminance pels, or a 2.times.2 array of four 8.times.8 transformed blocks of pels. MBs are coded in Slices of consecutive variable length strings of MBs, running left to right across a picture. Slices may begin and end at any intermediate MB position of the picture but must respectively begin or end whenever a left or right margin of the picture is encountered. Each Slice begins with a Slice Header that contains information of the vertical position of the Slice within the picture, information of the quantization scale of the Slice and other information such as that which can be used for fast-forward, fast reverse, resynchronization in the event of transmission error, or other picture presentation purposes.
The Macroblock is the basic unit used for MPEG motion compensation. Each MB contains an MB header, which, for the first MB of a Slice, contains information of the MB's horizontal position relative to the left edge of the picture, and which, for subsequently transmitted MBs of a Slice, contains an address increment. Not all of the consecutive MBs of a Slice are transmitted with the Slice.
The presentation of MPEG video involves the display of video frames at a rate of, for example, twenty-five or thirty frames per second (depending on the national standard used, PAL or NTSC, for example). Thirty frames per second corresponds to presentation time intervals of approximately 32 milliseconds. The capacity of MPEG signals to carry the information necessary for HDTV and other presentations providing high resolution video is achieved in part by exploiting the concept that there is typically a high degree of correlation between adjacent pictures and by exploiting temporal redundancies in the coding of the signals. Where two consecutive video frames of a program are nearly identical, for example, the communication of the consecutive frames requires, for example, only the transmission of one I-picture along with the transmission of a P-picture containing only the information that differs from the I-picture, or Reference Picture, along with information needed by the decoder at the receiver to reconstruct the P-picture from the previous I-picture. This means that the decoder must have provision for storage of the Reference Picture data.
Information contained in a P-picture transmission includes blocks of video data not contained in a Reference I- or P-picture as well as data needed to relocate in the picture any information that is contained in the previous I- or P-picture that has moved. The technique used in MPEG systems to accomplish P-picture construction from a Reference picture is the technique of Forward Prediction in which a Prediction Error in the form of a Prediction Motion Vector (MV) is transmitted in lieu of the video data of a given or Target MB. The MV tells the decoder which MB of the I- or P- Reference Picture, is to be reproduced as the Target MB.
With B-pictures, a Bidirectional Temporal Prediction technique called Motion Compensated Interpolation, is used. Motion Compensated Interpolation is accomplished by transmitting, in lieu of the video data for a Target MB, an MV that specifies which MB to copy either from the previous Reference Picture or from the next future Reference Picture, or from the average of one MB from each of the previous and next future Reference Pictures.
An MPEG Motion Compensated Prediction video decoder of a type that is practical for HDTV must possess a Reference Picture data storage capability that permits construction by the receiver decoder of the B- and P-frames containing the motion vectors that specify MB of the reference pictures. In order to provide sufficient data retrieval speed to perform the motion compensation calculations, static or on chip memory (SRAM) could be provided. The use of SRAM having the storage capacity necessary to store a video picture is a straight-forward but expensive way to provide the capability. The use of an off-chip DRAM buffer as an alternative to on-chip static ram, however, presents the problem of memory access time exceeding the interframe time availabilities of the program. For example, the specification in the MVs of the MBs that must be retrieved to perform the motion compensation prediction calls for access to the storage medium in an order that has a substantial random component. DRAM memory is by nature divided into memory segments called "pages". Consecutive reads of the memory within a page require substantially less time than consecutive reads that call for the crossing of page boundaries. The random memory access requirements of motion compensation prediction results in a number of memory page crossings, which can result in memory access time requirements that prevent the efficient and effective use of DRAM memory buffers.
In addition to the retrieval of MBs from reference pictures, the storage and retrieval of data of individual pels can adversely affect the efficiency of the decoding process. This is particularly the case with post filtering, where low pass filtering is applied pixel-to-pixel to remove rapid spacial fluctuations in values. For the presentation of video, the decoder must generate both a luminance (overall brightness or intensity) value for each pel and a chrominance (color pair) value for each pel. The MPEG-2 standard, however, calls for the ability to decode color video programs at bit rates as low as 4 Mbits per second, as for both progressive (non-interleaved) or interleaved video.
With interleaved video, a video frame is formed of two fields, one containing the even scan lines of a picture (the "top field") and one containing the odd scan lines of the picture (the "bottom field"). The fields are alternately output to a video display in each 32 millisecond cycle, allowing 16 milliseconds for each field to be output. Certain standards such as the CCIR-601 standard, which must be supported by MPEG, include an interleaved format. For interleaved video motion compensation in MPEG-1, all pictures are frame pictures that include both the top field and the bottom field, but in MPEG-2, the I-frames, P-frames and B-frames may be either full video frames of both top and bottom fields or may include only a top field or a bottom field.
Further, depending on the bitrate and format employed, one chrominance pair may be coded for each luminance value. This is referred to as a 4:4:4 chrominance format, and requires the highest bitrate or coding efficiency. Alternatively, other formats provide for one chrominance value for each two or four luminance values, by subsampling chrominance 2:1 horizontally, or both horizontally and vertically. Such formats are referred to as the 4:2:2 format and the 4:2:0 format, respectively. With interleaved pictures, where a picture of alternating top rows of luminance pels is first transmitted and then a picture of alternating bottom rows is transmitted, alternating rows of chrominance pair pel values are transmitted with fields of the luminance pels, with those chrominance pels transmitted with the "top" field relating to a 2.times.2 array of top field luminance values, while those chrominance pels transmitted with the "bottom" field relate to an interleaved 2.times.2 array of bottom field luminance values. As a result, a straight-forward storage and retrieval of luminance and chrominance data in the order it is received can complicate and substantially slow the decoding process.
In particular, MPEG-2 video decoders must decode signals with interleaved video in what has been called, and referred to above as, the CCIR-601 (and which has also been called the ITU-R) color video format, where each pixel is coded as a luminance 8 bit value sampled at a 13.5 MHz rate along with a red chrominance value and a blue chrominance value, 8 bits each and sampled at a 6.75 MHz rate. In this format, the video frames are 720 pels per line, and either 480 lines per frame at 30 frames per second or 576 lines per frame at 25 frames per second. Uncompressed, this requires 216 Mbits/s, but the signal may be compressed to as few as 2 Mbits/s, with 4 Mbits/s being a typical rate.
Each of the formats referred to above and other formats, together with the variety of formats that MPEG receivers must decode, make it difficult to effectively and efficiently buffer the data for the video being reproduced at the receiver. Accordingly, in the decoding and reproduction of MPEG video programs, there is a need for an effective and efficient memory usage scheme, particularly for performing Motion Compensation Prediction and post filtering.