1. Field of the Invention
The invention relates to an architecture for a memory with a wide word width and particularly, though not exclusively, one suited for use as a high definition video frame store memory, and an accompanying organization for storing data, e.g., pixel values, in such a memory to facilitate efficient macroblock and raster access therefrom.
2. Description of the Prior Art
Digital systems that display video information generally incorporate a video frame store memory. This memory essentially allows a frame of video data to be independently received and written into the memory, and read from the memory for subsequent display. In addition, the memory also provides a vehicle through which desired pixel data, such as, e.g., a separate image, can be superimposed onto a stored image to yield a composite image for subsequent display.
Over the years, video images have been digitized with increasing resolution, thus generating a substantially increased amount of pixel data for a single video frame. As such, frame store memories have increased both in size and access speed; the former to store data for increased number of pixels in each image frame, the latter being required to transfer a correspondingly increasing amount of bits into and from the memory in order to assure that the video images are displayed in real-time.
Currently, high definition television (HDTV) and other high performance video display applications are being developed which operate at very high resolutions relative to prior video display applications. These applications generate enormous amounts of image data for a single frame and thus require inordinately high data transfer rates, such as a worst case scanning rate on the order of 800 Mbits/second for an HDTV image. As one would expect, such transfer rates, to assure real-time display, impose onerous bandwidth requirements both on the HDTV transmission channel as well as on the video frame store memory itself.
The art has attempted to ameliorate the bandwidth requirements, at least for transmission of an HDTV signal, through use of image compression. In that regard and to the extent relevant, a so-called MPEG-2 (Motion Picture Experts Group) standard has been adopted to provide a uniform method of image compression. In essence, this standard relies on transmitting video images on a layered hierarchical basis, for simplicity and in order of increasing granularity: a sequence made of groups of pictures (also referred to as "frames"); each group being formed of so-called "I", "P" or "B" pictures; each picture formed of so-called "slices", each "slice" formed of so-called "macroblocks"; each macroblock formed of four 8-by-8 blocks of luminance values and a single 8-by-8 pixel block for each of two chrominance values; with finally each pixel block being an 8-by-8 array of 8-bit (1 byte) sampled pixel values. At the highest level, sequences are independent segments in an MPEG data stream; each sequence has a header followed by one or more compressed pictures. Each picture may be an "I", "P" or "B" picture. An "I" ("intra") picture is one that is composed entirely of macroblocks with no reference to any other macroblocks, i.e. without any reference to any other picture. We will refer to these self-contained macroblocks as "I" macroblocks. A "P" ("predicted") picture is one that is predicted, using motion vectors, relative to a previous picture. In that regard, a "P" picture contains macroblocks which are described using macroblocks from previous "I" or "P" pictures as reference. Hence, a "P" picture is composed of "I" and "P" macroblocks. The position of a reference macroblock relative to a current macroblock is specified using motion vectors. In contrast, a "B" ("bi-directional") picture is one that is predicted, again through motion vectors, based on either a previous picture, a following picture, or both. Hence, a "B" picture may contain macroblocks which are described using macroblocks from previous or following (or both) "I" or "P" pictures as reference. We will refer to these macroblocks as "B" macroblocks. Thus, a "B" picture may contain "I", "P" and "B". macroblocks. A slice is formed of a sequence of macroblocks in raster scan order, i.e., horizontally across a vertical position in a picture. Each macroblock is a square portion of a picture, and sized as a 16-pixel by 16-line area. For each macroblock, three matrices are generated: a 16-by-16 matrix of 8-bit luminance, Y, component values, and a separate 8-by-8 matrix of 8-bit sub-sampled (2:1 both horizontally and vertically) pixel values for each of two chrominance components, U and V. Once these three component matrices are generated, they are each encoded in a similar fashion. Specifically, the values for each component for each macroblock are transformed through a discrete cosine transform (DCT), then quantized and finally encoded through variable run length coding to yield corresponding compressed data. For further details on MPEG compression, the reader is referred to, e.g.: Information Technology--General Coding of Moving Pictures and Associated Audio, Recommendation H.262 ISO/IEC 13818-2, Committee Draft, ISO/IEC JTC1/SC29 WG/602, International Organization for Standardization, November 1993, Seoul, pages 1-176. Decoding basically proceeds in an inverse fashion.
While use of the MPEG-2 compression standard is expected to appreciably reduce transmission channel bandwidth requirements, this standard has no effect on the video frame store memory within which, of necessity, pixel data must be totally decompressed for display. Hence, the problem of adequate video frame store memory bandwidth still persists.
A frame reconstruction circuit for use with an MPEG-2 decoder typically accesses a macroblock of pixel data at a time. Frame store memory transactions that would occur during a process of reconstruction depend on whether the macroblock is a "I", "P" or "B" type macroblock. In case of a "I" macroblock, all the information for its reconstruction is contained in the MPEG-2 bitstream. Hence, the frame reconstruction circuit accesses the frame store memory for writing a decoded macroblock. In the case of "P" macroblocks, the frame reconstruction circuit reads the reference macroblock from the frame store memory, re-constructs the current macroblock, and then writes the current macroblock to the frame store memory. Hence, one read and one write transactions are required. In the case of "B" macroblocks, the frame reconstruction circuit reads two reference macroblocks, re-constructs the current macroblock and then writes the current macroblock into the frame store memory. Hence, two read and one write transactions are required.
A macroblock which is reconstructed by the frame reconstruction circuit and written to the frame store memory always starts at a pixel whose horizontal and vertical positions, in a frame, are a multiple of 16. In contrast, a reference macroblock which is read by the frame reconstruction circuit can start at any arbitrary position on a grid which has twice the number of points in both horizontal and vertical directions compared to the original image. We will refer to this grid as the "half pixel" grid, as contrasted with a grid, i.e. a so called "full pixel" grid, that has the same resolution as the original image. This means that the frame reconstruction circuit actually needs to read an array of 17.times.17 pixels, starting at any arbitrary position on the full pixel grid. A 16.times.16 array on a half pixel grid can then be reconstructed using interpolation.
These requirements of the frame reconstruction process can be used to derive a data organization that is optimal for the frame reconstruction process. Consider a frame store memory which is n-bytes wide. Such a frame store memory stores n bytes in each location of the memory. For purposes of illustration, we will consider a 16-byte wide frame store memory and possible ways of storing luminance, Y, data. Each memory location can store 16 pixel values. The 16 pixels can be chosen from a number of possible pixel array organizations which include: 16.times.1, 1.times.16, 2.times.8, 8.times.2, and 4.times.4. The 4.times.4 organization is the most optimal for the frame reconstruction process and thus results in the fewest number of memory transactions.
While the frame reconstruction circuit requires macroblocks of data at a time, the scan circuitry, which obtains data for display from the frame store memory, has markedly different access requirements. The scan circuitry uses data on a raster basis which requires data representing a complete horizontal line at a time. Organizing data in the frame store memory to give optimal performance for the frame reconstruction process will result in non-optimal performance of scanning process. This is again illustrated using a 16-byte wide frame store memory. Since each memory location stores a 4.times.4 array of pixels, the scanning circuit would have to retrieve 4 lines of data to display a single line, hence wasting 75% of available memory bandwidth. As the width of the frame store memory increases, the scanning and frame reconstruction processes become increasingly incompatible.
Bandwidth requirements for HDTV frame store memory will require that the frame store memory be implemented as wide word memory if that memory is to be built using cost-effective technology. However, a conventional architecture, such as that described above, will result in less than optimal bandwidth.
Currently, video frame store memories are fabricated from large asynchronous dynamic random access memory (DRAM) integrated circuits. While these circuits provide a cost-effective memory implementation, an intrinsic operation of these memories tends to waste access time and diminish throughput. In particular, an asynchronous DRAM does not utilize a clock signal but instead requires that the accessing circuitry wait a specified amount of time until the memory completes its access operation and can provide accessed data at its output port. This forces any circuitry that is clocked faster than this memory access time to simply wait until that data is available. Synchronous DRAM (S-DRAM) circuits, e.g. 2 Mbyte.times.8 bit S-DRAM part number MT48LC2MSS1 S (hereinafter the "M8S1" part) from Micron Semiconductor, Inc. in Boise, Id., will shortly become available. S-DRAMs are clocked; a finite number of clock cycles is required for a random access operation. S-DRAMs, specifically the M8S1 part, will provide an operational mode known as the "burst" mode through which 1, 2, 4 or 8 contiguous memory locations may be sequentially accessed and advantageously only the first memory access involves a fixed amount of wait with subsequent accesses occurring without any waiting. S-DRAMs also will have a dual memory bank architecture. If the data organization within S-DRAMs is such that a burst mode access, which is equal to or greater in duration than the random access time, can be utilized, then it should conceivably be possible to interleave data in the two memory banks and obtain full utilization (100%) of available memory bandwidth. In principle, this is possible because the waiting to complete a random access operation for each bank can occur while the other bank is being accessed in its "burst" mode. While a potential appears to exist for very high bandwidth utilization through use of S-DRAMs, in practice, conventional data organizations that would employ S-DRAMs may not be able to provide this result for the simple reason that not every burst mode access will be fully utilized due to conflicting memory access requirements; hence, once again wasting memory access and bandwidth.
Thus, a need exists in the art for a large video frame store memory, particularly one with a wide word width, that does not exhibit appreciable access inefficiencies due to conflicting access requirements of, e.g., a macroblock-based frame reconstruction circuit and a raster-based scan circuit. Furthermore, such a memory should be amenable to implementation with S-DRAM circuits. Advantageously, such a video frame store memory should exhibit markedly increased bandwidth over conventional video frame store memories and thus be capable of operating at very high data transfer rates expected in HDTV and other high performance video applications.