1. Field of the Invention
This invention relates generally to the field of multimedia systems, and more particularly to a video decoding device having the ability to meet particular predetermined transmission and display constraints. The video decoding device is particularly suited for Motion Picture Expert Group (MPEG) data compression and decompression standards.
2. Description of the Related Art
Multimedia software applications including motion pictures and other video modules employ MPEG standards in order to compress, transmit, receive, and decompress video data without appreciable loss. Several versions of MPEG currently exist or are being developed, with the current standard being MPEG-2. MPEG-2 video is a method for compressed representation of video sequences using a common coding syntax. MPEG-2 replaces MPEG-1 and enhances several aspects of MPEG-1. The MPEG-2 standard includes extensions to cover a wider range of applications, and includes the addition of syntax for more efficient coding of interlaced video and the occurrence of scalable extensions which permit dividing a continuous video signal into multiple coded bitstreams representing video at different resolutions, picture quality, or frame rates. The primary target application of MPEG-2 is the all-digital broadcast of TV quality video signals at coded bitrates between 4 and 9 Mbit/sec. MPEG-1 was optimized for CD-ROM or applications transmitted in the range of 1.5 Mbit/sec, and video was unitary and non-interlaced.
An encoded/compressed data stream may contain multiple encoded/compressed video and/or audio data packets or blocks. MPEG generally encodes or compresses video packets based on calculated efficient video frame or picture transmissions.
Three types of video frames are defined. An intra or I-frame is a frame of video data including information only about itself. Only one given uncompressed video frame can be encoded or compressed into a single I-frame of encoded or compressed video data.
A predictive or P-frame is a frame of video data encoded or compressed using motion compensated prediction from a past reference frame. A previous encoded or compressed frame, such as an I-frame or a P-frame, can be used when encoding or compressing an uncompressed frame of video data into a P-frame of encoded or compressed video data. A reference frame may be either an I-frame or a P-frame.
A bidirectional or B-frame is a frame of video data encoded or compressed using motion compensated prediction from a past and future reference frame. Alternately, the B-frame may use prediction from a past or a future frame of video data. B-frames are particularly useful when rapid motion occurs within an image across frames.
Motion compensation refers to the use of motion vectors from one frame to improve the efficiency for predicting pixel values of an adjacent frame or frames. Motion compensation is used for encoding/compression and decoding/decompression. The prediction method or algorithm uses motion vectors to provide offset values, error information, and other data referring to a previous or subsequent video frame.
The MPEG-2 standard requires encoded/compressed data to be encapsulated and communicated using data packets. The data stream is comprised of different layers, such as an ISO layer and a pack layer. In the ISO layer, packages are transmitted until the system achieves an ISO end code, where each package has a pack start code and pack data. For the pack layer, each package may be defined as having a pack start code, a system clock reference, a system header, and packets of data. The system clock reference represents the system reference time.
While the syntax for coding video information into a single MPEG-2 data stream are rigorously defined within the MPEG-2 specification, the mechanisms for decoding an MPEG-2 data stream are not. This decoder design is left to the designer, with the MPEG-2 spec merely providing the results which must be achieved by such decoding.
Devices employing MPEG-1 or MPEG-2 standards consist of combination transmitter/encoders or receiver/decoders, as well as individual encoders or decoders. The restrictions and inherent problems associated with decoding an encoded signal and transmitting the decoded signal to a viewing device, such as a CRT or HDTV screen indicate that design and realization of an MPEG-compliant decoding device is more complex than that of an encoding device. Generally speaking, once a decoding device is designed which operates under a particular set of constraints, a designer can prepare an encoder which encodes signals at the required constraints, said signals being compliant with the decoder. This disclosure primarily addresses the design of an MPEG compliant decoder.
Various devices employing MPEG-2 standards are available today. Particular aspects of known available decoders will be described.
Frame Storage Architecture
Previous systems used either three or two and a half frame storage for storage in memory.
Frame storage works as follows. In order to enable the decoding of B-frames, two frames worth of memory must be available to store the backward and forward anchor frames. Most systems stored either a three frame or two and a half frames to enable B-frame prediction. While the availability of multiple frames was advantageous (more information yields an enhanced prediction capability), but such a requirement tends to require a larger storage buffer and takes more time to perform prediction functions. A reduction in the size of memory chips enables additional functions to be incorporated on the board, such as basic or enhanced graphic elements, or channel decoding capability. These elements also may require memory access, so incorporating more memory on a fixed surface space is highly desirable. Similarly, incorporating functional elements requiring smaller memory space on a chip is also beneficial.
Scaling
The MPEG-2 standard coincides with the traditional television screen size used today, thus requiring transmission having dimensions of 720 pixels (pels) by 480 pixels. The television displays every other line of pixels in a raster scan The typical television screen interlaces lines of pels, sequentially transmitting every other line of 720 pels (a total of 240 lines) and then sequentially transmitting the remaining 240 lines of pels. The raster scan transmits the full frame at {fraction (1/30)} second, and thus each half-frame is transmitted at {fraction (1/60)} second.
For MPEG storage method of storing two and a half frames for prediction relates to this interlacing design. The two and a half frame store architecture stores two anchor frames (either I or P) and one half of a decoded B frame. A frame picture is made up of a top and a bottom field, where each field represents interlaced rows of pixel data. For example, the top field may comprise the first, third, fifth, and so forth lines of data, while the bottom field comprises the second forth, sixth, and so on lines of data. When B frames are decoded, one half the picture (either the top field or the bottom field) is displayed. The other half picture must be stored for display at a later time. This additional data accounts for the xe2x80x9chalf framexe2x80x9d in the two and a half frame store architecture.
In a two frame store architecture, there is no storage for the second set of interlaced lines that has been decoded in a B-frame. Therefore, an MPEG decoder that supports a two frame architecture must support the capability to decode the same picture twice in the amount of time it takes to display one picture. As there is no place to store decoded B-frame data, the output of the MPEG decoder must be displayed in real time. Thus the MPEG decoder must have the ability to decode fast enough to display a field worth of data.
A problem arises when the picture to be displayed is in what is called the xe2x80x9cletterboxxe2x80x9d format. The letterbox format is longer and narrower than the traditional format, at an approximately 16:9 ratio. Other dimensions are used, but 16:9 is most common. The problem with letterboxing is that the image is decreased when displayed on screen, but picture quality must remain high. The 16:9 ratio on the 720 by 480 pel screen requires picture on only xc2xe of the screen, while the remaining xc2xc screen is left blank. In order to support a two-frame architecture with a letterboxing display which takes xc2xe of the screen, a B-frame must be decoded in xc2xe the time taken to display a field of data.
The requirements to perform a two frame store rather than a two and a half or three frame store coupled with the desire to provide letterbox imaging are significant constraints on system speed which have not heretofore been achieved by MPEG decoders.
It is therefore an object of the current invention to provide an MPEG decoding system which operates at 54 Mhz and sufficiently decodes an MPEG data stream while maintaining sufficient picture quality.
It is a further object of the current invention to provide an MPEG decoder which supports two frame storage.
It is another object of the current invention to provide a memory storage arrangement that minimizes on-chip space requirements and permits additional memory and/or functions to be located on the chip surface. A common memory area used by multiple functional elements is a further objective of this invention.
It is yet another object of the current invention to provide an MPEG decoder which supports signals transmitted for letterbox format.
According to the current invention, there is provided a system and method for performing an inverse discrete cosine transform (IDCT) calculation based on DCT data. The system is IEEE compliant and transforms one block (8xc3x978) of pixels in 64 cycles.
The system is a part of a larger system which comprises a macroblock core (MBCORE) and a transformation/motion compensation core (TMCCORE). The IDCT processor is a part of the TMCCORE. The TMCCORE receives the DCT input, produces the matrix (QXtQ)P, or XQP, in IDCT Stage 1 and stores the result in transpose RAM. IDCT Stage 2 performs the transpose of the result of IDCT Stage 1 and multiplies the result by P, completing the IDCT process and producing the IDCT output.
The IDCT processor receives 12 bits of DCT data input which ranges with the sign bit from xe2x88x922048 to +2047. The system performs a sign change to convert to sign magnitude. If necessary, the system changes xe2x88x922048 to xe2x88x922047, yielding eleven bits of data and a data bit indicating sign. The system performs the matrix function QXtQ, where X represents the DCT data and Q is a predetermined diagonal matrix. The resultant value is adjusted by discarding selected bits, and the system then postmultiplies this with the elements of a predetermined P matrix, and discards selected bits.
The system converts the sign magnitude to two""s complement. The system adds four blocks into each buffer, with the buffers having 22 bits each. A sign change is performed to obtain QXtQP. This completes first stage processing, which is then passed to transpose RAM.
The system then initiates IDCT stage 2, and performs a matrix transpose of QXtQP, yielding (QXtQP)t. The system then performs a twos complement to sign-magnitude, clips the least significant bit, and postmultiplies the result by the P matrix. The system then sign-magnitude converts this value back to 2""s complement, and adds four products into each buffer, and performs a sign switch to obtain the elements of (QXtQP)tP. The system then right shifts the data seven bits, with roundoff, and not a clipping, and then truncates the result to between xe2x88x92256 and 255.
Other objects, features, and advantages of the present invention will become more apparent from a consideration of the following detailed description and from the accompanying drawings.