Personal computers may be used to generate displays including video portions. For the purposes of the present application, the term "video" refers to full motion video images (e.g., derived from TV, film, video or the like) such as Cirrus Logic MotionVideo.TM. type displays. MotionVideo.TM. Architecture (MVA.TM.) is described, for example, in co-pending U.S. patent application Ser. No. 08/483,584, entitled "DUAL DISPLAYS HAVING INDEPENDENT RESOLUTIONS AND REFRESH DATES", filed Jun. 7, 1995 and incorporated herein by reference. Such video portions may be generated from a data source (e.g., CD-ROM) where video data may be encoded in one of a number of formats (e.g., MPEG-I, MPEG-II, Indeo.TM. or the like).
Traditionally, MPEG decoding may be performed by a dedicated hardware decoder. A hardware MPEG decoder may receive MPEG encoded data from a data source (e.g., CD-ROM) and output YUV data to discrete portions of display memory of a display controller, as illustrated in FIG. 2.
FIG. 2 is a block diagram illustrating major components of a computer system 100 provided with display controller 120 (e.g., Video Graphics Adapter (VGA), Super VGA (SVGA) or the like). Display controller 120 may generate pixel data for display 180 (e.g., CRT, flat panel display or the like) at a rate characteristic of the refresh rate of display 180 (e.g., 60 Hz, 72 Hz, 75 Hz, or the like) and horizontal and vertical resolution of a display image (e.g., 640.times.480 pixels, 1024.times.768 pixels, 800.times.600 pixels or the like). A continuous stream of pixel data may be generated by display controller 120 at the characteristic rate of display 180.
Display controller 120 may be provided with a display memory 130 which may store pixel data in text, graphics, or video modes for output to display 180. Host CPU 110 may be coupled to display controller 120 through bus 150 and may update the contents of display memory 130 when a display image for display 180 is to be altered. Bus 150 may comprise, for example, a PCI bus or the like. System memory 160 may be provided coupled to Host CPU 110 for storing data.
Hardware MPEG decoder 140 may be provided to decode MPEG video data from an MPEG video data source (e.g., CD-ROM or the like) and output decoded video data to system memory 160 or directly to display memory 130. However, with the advent of increasingly powerful and faster microprocessors (e.g., Pentium.TM. or PowerPC.TM. processor or the like) it may be possible to implement MPEG decoding (or the like) entirely within software operating within host CPU 110. For example, future versions of Microsoft.RTM. Windows 95.TM. may include such MPEG decoding software. Intel.RTM. also offers a software video decoding technique under the trademark Indeo.TM..
Applications software or operating systems (e.g., Windows.TM. 95) may be provided with such MPEG or Indeo.TM. decoding software. Placing MPEG or Indeo.TM. decoding software within applications software or an operating system may allow a user to view video portions on a display screen without the need for purchasing additional hardware such as dedicated MPEG hardware decoder 140.
However, even with high performance microprocessors, decoding of MPEG data may be a host CPU intensive operation, which may degrade overall performance of computer system 100. A large portion of host CPU cycles required to implement MPEG decoding may be required for data transfer and formatting, rather than decoding per se.
MPEG data may be decoded and decompressed (in software and/or hardware) from an MPEG data source in several steps. Host CPU 110 (or dedicated MPEG decoder 140) may retrieve compressed/encoded MPEG data from an MPEG data source (e.g., CD-ROM or the like) and first perform a Huffman decoding, followed by inverse quantization of data, inverse DCT (Discrete Cosine Transform), and motion compensation (compression between frames). For software MPEG decoding, a 90 MHz Pentium.TM. microprocessor may be just barely able keep up with these first four steps at a rate of 30 frames per second.
Once decoded and decompressed, MPEG data in YUV format may be transferred from component YUV video (i.e., planar form) to a pixel video format (i.e., raster scan format). The pixel video YUV data may then be converted from YUV to RGB (Red, Blue and Green pixel data) and then stored in display memory 130 to be displayed on display 180. Prior art hardware video accelerators may handle the YUV to RGB conversion step to remove that task from host CPU 110. However, the step of formatting YMU component data to pixel video form may still be required.
Formatting YUV component data to pixel video form may require host CPU 110 (for hardware MPEG decoding, MPEG decoder 140) to decode MPEG data, as discussed above into a YUV 4:2:2 video format (i.e., CCIR 601 format) where groups of two pixels may be encoded as two bytes of luminance (Y) data as well as two bytes of chrominance difference (U,V) data. Display 180 and display controller 120 may require that output data be generated in a basic pixel video (i.e., scan line) format such that all data (e.g., RGB or YUV) for each output pixel located in consecutive locations within display memory 130.
In a YUV 4:2:2 format, two bytes of Y data may be followed by one byte of U data and one byte of V data. Each double word (DWORD) read out may thus comprise information for two adjacent pixels of data which may be read by display controller 120 in sequential addresses to be consistent with pixel video methods of display and make best use of available memory bandwidth.
Prior art MPEG decoding techniques (hardware or software) may first decompress MPEG data from an MPEG data source (e.g., CD-ROM or the like) into separate Y, U, and V values. These Y, U, and V values may then be stored initially into separate Y, U, and V memory areas (planes) in system memory 160 as illustrated in FIG. 1A in a format known as YUV planar format or component YUV.
System memory 160 may comprise separate contiguous areas of memory 102, 103 and 104 for storing Y, U and V data, respectively. For video data in the CCIR 601 format, two Y values may be provided for each U and V values to comprise pixel data for two adjacent pixels. Thus, the Y portion of system memory 160 may be twice as large as each of the respective U and V portions 103 and 104.
To combine separate Y, U, and V data into a format convenient for prior art video accelerators, host CPU 110 may first read two bytes of data from system memory area 102 containing Y data and shift one of those bytes over to a different byte location within a 32 bit DWORD register within host CPU 110. Next, host CPU 110 may read a byte of U data from the U area 103 of system memory 160 and then read a byte of V data from the V area 104 of system memory 160. Host CPU 110 may then combine separate Y, U, and V data into a YUV 4:2:2 formatted DWORD which in turn may be transferred to display memory 130.
Such byte shifting operations are not particularly efficient for such processors as the Pentium.RTM. processor and thus system performance may be degraded, because a significant percentage of the CPU cycle would be used just for data reformatting (i.e., component YUV to pixel video). Moreover, reading separate Y, U, and V data from non-contiguous portions of system memory 160 may require a large number of random access memory cycles, which will not get page cycles across the bus, further degrading system performance.
For a PCI bus system, it may be possible to combine separate read cycles in an internal cache within host CPU 110. However, processor and read cycle overhead may prevent system 100 from taking full advantage of burst cycles available in PCI bus architecture.
Once a YUV 4:2:2 formatted DWORD has been assembled within host CPU 110, it may then be stored in display memory 130 in a rasterized (i.e., pixel video) format as illustrated in FIG. 1B. Display memory 130 may comprise graphics portion 201 for storing graphics data (e.g., Windows.TM. Graphical User Interface (GUI) data), and one or more video buffers 202 and 203 for storing video data representing full motion video images (e.g., Cirrus Logic MotionVideo.TM. images). Two video buffers 202 and 203 may be provided to prevent generation of artifacts on display 180.
If host CPU 110 were writing into the same area of display memory 130 simultaneously being used for generating an image on display 180, such writing action may be visible on display 180. A user might perceive CPU writes to display memory 130 as it is being painted or as tearing effect, as sometimes occurs, for example, in video games.
In prior art display controllers, such artifacts may be eliminated by double buffering video data. Separate video buffers 202 and 203 may be provided within display memory 130 to store consecutive frames of video data. Host CPU 110 may write to one video buffer 202 within display memory 130 while data from another buffer 203 is being read out to display 180. Such double buffering may not require large amounts of display memory 130, as MPEG video data may typically be rendered at a resolution of 352 by 240 pixels, which may be zoomed up to any size including full display resolution (e.g., 1024 by 768 pixels).
One difficulty encountered in double-buffering display data is that a mechanism must be provided to instruct host CPU 110 and display controller 120 to switch their respective write and read cycles alternatively from video buffers 202 and 203. If display controller 120 is reading display data from the same video buffer 202 or 203 which host CPU 110 is writing to, the advantage of double buffering may be negated. Upon completing a write cycle to fill one of video buffer 202 or 203, display controller 120 need be signaled to switch reading from the other of video buffers 202 and 203.
A YUV formatted DWORD may be stored in pixel video format within video buffer 202 or 203 of display memory 130. Display controller 120 may readily generate video images from pixel video YUV data stored within video buffer 202 or 203 of display memory 130.
One processor intensive portion of software MPEG decoding, therefore, is the method of transferring the planes of Y, U, and V data from system memory 160 into display memory 130 in a pixel video format. Another processor intensive portion of software MPEG decoding is the need to vertically up-sample chrominance difference (U,V) data. Data encoded in an MPEG format has the same number of luminance (Y) samples (or bytes) as there are actual pixels displayed for the resulting playback. However chrominance difference samples (U and V) played back are sub-sampled both horizontally and vertically (e.g., one V and U data pair for each 2.times.2 block of Y data).
The MPEG encoding technique may encode pixel data from blocks of four luminance samples in a two dimensional pattern (e.g., two by two pixels) for every one pair of chrominance difference samples (U,V). Chrominance difference samples (U,V) may actually be sub-sampled from the center point of a two by two pixel block. Upon decompression, chrominance difference data (U,V) may be replicated to create chrominance difference samples for groups of two pixels in the YUV 4:2:2 format.
FIG. 1C illustrates how horizontal and vertical sub-sampling may occur to create interpolated U and V values. FIG. 1C illustrates Y, U, and V values stored in display memory 130. As data is stored in display memory 130 in a pixel video format (e.g., scan line by scan line) it may be a relatively easy task to interpolate U and V data horizontally. However, as U and V data is sub-sampled in both horizontal and vertical directions, it may be necessary to interpolate (or replicate) U and V data in a vertical direction.
Thus, for example, as illustrated in FIG. 1C, every other line of video data may require interpolation (or replication) of U and V data from other adjacent lines, to create U and V values to fill in the areas in indicated by the * values in FIG. 1C. Unfortunately, such vertical interpolation may be much more difficult to achieve than horizontal interpolation. Data from adjacent lines may need to be stored for later replication (or interpolation) when data for a particular line is stored in display memory 130.
Such storage of adjacent U and V values may require large amounts of memory or register space and may require cumbersome processor operations. It would be desirable, therefore, to reduce data bandwidth between host CPU 110 and display memory 130 by transferring only those chrominance difference (U,V) data decoded and perform replication of such data within display controller 120.