The present invention relates to computer graphics apparatus, using video RAMs and conventional parallel accessed frame buffers.
In a conventional graphics display system, the display file which holds the view of a picture is placed in a refresh memory or frame buffer. A display processor reads the contents of the frame buffer and sends instructions to vectors generators which convert geometric descriptions into XY analog voltages to control the deflection of the electron beam of a cathod ray tube.
The architecture of generalized computer graphics display system is shown in FIG. 1. The geometric pipeline subsystem 1 receives output primitive from the host and generates command for the pixel rendering module 2. The picture rendering module 2 receives the commend and calculates pixel data to write into memory module 3. The memory module 3 stores the pixel data ready to be displayed and is controlled by the display control module 4 to serially shift out the pixel data. Then the pixel data is converted to analog signal on the screen.
The memory arrangement in the memory module 3 is the key component in the display subsystem. It influences the performance of the system, and determines whether the display system can be implemented by hardware. The memory module also influences the implementation in the pixel rendering module 2 and display control module 4.
Due to the reduction in the cost of random access memory (RAM), the random access raster scan display is currently the most popular computer graphics apparatus. According to the goal of the European Ergonomic standards, which define and measure certain factors on the CRT display/human interface, the screen refresh rate should not go below 60 Hz and should preferably have a rate of 70 Hz. The video rate (pixel frequency) for a given CRT can be calculated by the equation: ##EQU1##
The horizontal retrace period is approximately 10% of the horizontal scan period, the vertical retrace period is approximately 10% of the vertical scan period.
Based on the above formula, one can obtain the following table.
TABLE 1 ______________________________________ Display resolution # of pixels Video frequency Pixel time ______________________________________ 1024*1024 .about.90 MHz 11 ns 1280*1024 .about.110 MHz 9 ns 1600*1280 .about.170 MHz 5 ns 2048*2048 .about.350 MHz 3 ns ______________________________________
If the display resolution is high, it requires relatively long time to access the frame buffer for refreshing the screen image. However, if the access time becomes large in relation to the time which the graphics processor of the host processor access the frame buffer to modify the display image, then the response time of the graphics display from instruction to modification becomes very long. The best approach is to use a video RAM (VRAM). VRAM has two ports: the random port and the serial port. The random port has the same function as a standard DRAM. The serial port has the same function as a shift register. In video applications, the serial port acts as a second memory port and is used for screen refresh. In the horizontal blanking period, one line in the random port is transferred to the serial port, and in the display period, data contained in the serial port is shifted out as the pixel signals. Once the video data is loaded in parallel from the random port to the serial port, no further access is required to the random port for screen refresh. One can see the full bandwidth for graphics process or host processor to access the random port in all display period.
The 64K*4 VRAm has been developed by many IC companies with serial clock cycle time up to 40 ns. The 256K*4 has also been developed by many IC companies with serial clock cycle time up to 30 ns. The capacity has increased four times but the serial clock cycle time has reduced by only 10 ns. This effect is the impetus for invention of a new arrangement scheme for the graphics display system.
Due to the advancement of the semiconductor technology, the capacity of the VRAM increases from 64K*4 to 256K*4. In the past, to design a frame buffer containing 2K*2K pixels with 8 bit planes in a single pixel requires 128 pieces of 64K*4 VRAMs. Nowadays, it only needs 32 pieces of 256K*4 VRAMs, and decreases the number of IC chips by 100. This decrease makes it convenient to manufacture and maintain, and increases the reliability of the product.
The design using 256K*4 VRAMs, however, causes some new problems for the designer. Firstly, although the capacity increases by four times from 64K*4 to 256K*4, the serial clcok cycle time only decreases from 40 ns to 30 ns. This decrease does not match the ratio of the increases storage. Secondly, because of the reduction of the VRAM chip number, the partitionable bank number is decreased. Take 2K*2K addressable pixel frame buffer with eight bit planes per pixel as an example. It can be arranged to have 64 banks using 64K*4 VRAMS, but only 16 banks can be arranged using 256K*4 VRAMs. These two problems must be taken into consideration in the memory arrangement of a parallel accessed frame buffer.
According to the user's requirement in graphics efficiency, some graphics architecture with parallel processing has been proposed. It is obvious from these architecture that if a 4*4 or 8*8 pixel area is arranged as the unit region of the parallel processing, the system can achieve better graphics performance.
To date, the capacity of the frame buffer with the ability of parallel processing is mostly less than 1280*1024 pixels, and mostly using 64K*4 VRAMs.
Take a conventional 1280*1024 parallel accessed frame buffer as an example. When the 64K*4 VRAM is used, the minimum serial clock cyle is up to 40 ns and the pixel output rate is 110 MHz (9 ns/pixel) as indicated in Table 1.
FIG. 2-1 illustrtes a conventional memory arrangement for a 1280*1024 pixel display using an interleaved frame buffer. One may divide the 64K*4 VRAM into 20 banks. Each bank contains one fifth of pixels in one scan line, and contains one fourth of scan lines in a frame. Take bank 0 as an example, the row number 0 contains screen X=0, 5, 10, . . . , 1275 in Y=0, and column number 0 contains screen Y=0,4,8, . . . , 1016, 1020 in X=0. The horizontal direction contains 256 locations. Combining the locations from bank 0 to bank 4, there are 1280 locations, equal to one screen scan line. The vertical direction of the VRAM also contains 256 locations, and combines the banks in the vertical direction. There are 1024 locations, equal to the screen scan line numbers in one frame.
FIG. 2--2 illustrates the relationship between the pixels on screen and the pixels in the memory bank. One can randomly select 5*5 block area on screen and the corresponding area in the frame buffer can be accessed in parallel as indicated by the bank number of the frame buffer in each pixel location.
FIG. 2-3 illustrates the raster output sequence. For power saving consideration, one screen scan line is transferred at one time in a horizontal blanking period. That is to say, data from bank 0 to bank 4 for one horizontal line is transferred to the serial port, and then shifted out in the display period to display on the screen. Next, data from bank 5 to bank 9 for one horizontal line is transferred to the serial port, and so on.
For better picture quality, more addressable pixels are required and may be achieved with 1600*1280 resolution or 2048*2048 resolution. For reducing chip count, reducing board size, and increasing reliability, 256K*4 VRAMs may be used. However, if the high resolution is to be implemented, high clock rate is required. If the 256K*4 VRAMs are used, the partitionable memory banks are reduced, while the cycle is only reduced from 40 ns to 30 ns. These problems must be solved if the 256K*4 VRAMs are used to implement the 2K*2K resolution parallel accessed frame buffer.
Consider the operation of the conventional parallel accessed buffer. The clock rate is up to 350 MHz (3 ns) as shown in Table 1, and the minimum serial clock cycle is up to 30 ns. FIG. 3-1 illustrates the memory arrangement. The VRAM is divided into 16 banks. Each bank contains one fourth of pixels in one scan line, and contains one fourth of scan lines in a frame. Take bank 0 as an example. The row number 0 contains screen X=0, 4, 8, . . . , 2044 in Y=0, and column number 0 contains Y=0, 4, 8, . . . , 2044 in X=0. The horizontal direction of a VRAM contains 512 locations. Combining the locations from bank 0 to bank 3, there are 2048 locations, equal to one screen scan line. The vertical direction of a VRAM also contains 512 locations. Combining the banks in the vertical direction, there are 2048 locations, equal to the screen scan line numbers in one frame.
FIG. 3-2 illustrates the relationship between the pixel on screen and the corresponding position in the memory bank. Here a 4*4 bloc area is randomly selected and the frame buffer can be accessed in parallel.
FIG. 3-3 illustrates the raster output sequence. In this memory arrangement, four screen scan lines must be transferred at the first one of four horizontal scan lines. For power saving consideration, the screen lines can be transferred one by one, and continuously transferred four times during the horizontal blanking period.
The circuit block diagram for a conventional architecture of a display sub-system is illustrated in FIG. 4. Because the pixel clock rate is down to 3 ns, and the data shift simultaneously in the VRAM, the serial port can only support four pixels in the same screen horizontal line. Because 3 ns/pixel * 4 pixels=12 ns which is less than 30 ns, the temporary buffer must be used to store excess pixels ready for display. The data in the temporary buffer are read rapidly to the digital to analog converter VDAC/RAMDAC and are displayed on the screen. If the temporary buffer possesses the capacity for four scan lines, the cost would be excessive, because such access speed would require high speed circuits such as the Emitter Counpled Logic (ECL). Such chips occupy more board space, consume more power and increases the layout complexity and the hardware design complexity. Besides, the performance would be adversely afftected, because excessive time is required to update the temporary buffer for screen refresh.