1. Field of the Invention
The present invention relates generally to computer architecture, and more particularly, to memory-sharing architectures which include graphics capabilities.
2. State of the Art
As the density of solid state memories increases, oversized memories are being wastefully used for purposes which optimally require specialized memory configurations (e.g., graphics refresh). One reason for this is that manufacturers attempt to produce memory sizes which will achieve a broad range of applicability and a high volume of production. The more popular, and thus more cost-effective memories, tend to be fabricated with square aspect ratios or with tall, thin aspect ratios (i.e., a large number of fixed length words) that are not readily suited to specialized uses.
Although uses which can exploit memories with these popular aspect ratios can be implemented in a relatively cost-effective manner, specialized uses which cannot exploit these aspect ratios can be proportionately more expensive to implement. The expense associated with implementing specialized uses assumes one of two forms: (1) the increased cost associated with purchasing a memory which does not conform to a readily available and widely used memory configuration; or (2) the increased cost associated with purchasing a readily available memory which is much larger than needed to implement a specialized function (e.g., a relatively square memory which must be tall enough to obtain a desired width, even though only a relatively small number of rows in the memory are needed for the purpose at hand.)
The foregoing memory capacity problem is typically referred to as the memory granularity problem: expensive chips can be purchased and used efficiently or inexpensive memory chips can be purchased and used inefficiently. This problem is especially significant in computer systems which implement graphics, since these systems typically include a dedicated, high speed display memory. Specialized display memories are usually required because typically refresh for the graphics display (e.g., for a 1280.times.1024 display) consumes virtually all of the available bandwidth of a typical dynamic random access memory (DRAM).
To update a video line on a high resolution graphics display, a graphics refresh optimally requires a memory having a short, wide aspect ratio. Display memories used as frame buffers for high resolution graphics displays have therefore become an increasingly larger fraction of a system's overall cost due to the foregoing memory problem. For display memories, even a two megabit memory can be unnecessarily large, such that it cannot be effectively used. An exemplary display memory for a current high-end display of 1280.times.1024 pixels requires just over one megabyte of memory. Thus, almost one-half of the display memory remains unused.
For example, FIG. 1 illustrates a typical computer system 100 which includes graphics capabilities. The FIG. 1 computer system includes a central processing unit (CPU) 102, a graphics controller 104 and a system controller 106 all connected to a common bus 108 having a data portion 110 and an address portion 112.
The graphics controller 104 is connected to display memory 114 (e.g., random access memory, or RAM) by a memory bus having a memory address bus 116 and a memory data bus 118. RAMDAC 120 performs digital-to-analog conversion (DAC) of signals (e.g., analog RGB color signals) used to drive a graphics display.
The system controller is connected to system memory 122 by a separate memory address bus 124. A memory data bus 126 is connected directly between the common data bus 108 and the system memory. The system memory can also include a separate cache memory 128 connected to the common bus to provide a relatively high-speed portion for the system memory.
The graphics controller 104 mediates access of the CPU 102 to the display memory 114. For system memory transfers not involving direct memory access (DMA), the system controller 106 mediates access of the CPU 102 to system memory 122, and can include a cache controller for mediating CPU access to the cache memory 128.
However, the FIG. 1 configuration suffers significant drawbacks, including the granularity problem discussed above. The display memory 114 is limited to use in connection with the graphics controller and cannot be used for general system needs. Further, because separate memories are used for the main system and for the graphics memory, a higher number of pin counts render integration of the FIG. 1 computer system difficult. The use of separate controllers and memories for the main system and the graphics also results in significant duplication of bus interfaces, memory control and so forth, thus leading to increased cost. For example, the maximum memory required to handle worst case requirements for each of the system memory and the graphics memory must be separately satisfied, even though the computer system will likely never run an application that would require the maximum amount of graphics and main store memory simultaneously. In addition, transfers between the main memory and the graphics require that either the CPU or a DMA controller intervene, thus blocking use of the system bus.
Attempts have been made to alleviate the foregoing drawbacks of the FIG. 1 system by integrating system memory with display memory. However, these attempts have reduced duplication of control features at the expense of system performance. These attempts have not adequately addressed the granularity problem.
Some attempts have been made, particularly in the area of portable and laptop systems, to unify display memory and system memory. For example, one approach to integrated display memory and system memory is illustrated in FIG. 2. However, approaches such as that illustrated in FIG. 2 suffer significant drawbacks. For example, refreshing of the display via the graphics controller requires that cycles be stolen from the main memory, rendering performance unpredictable. Further, these approaches use a time-sliced arbitration mode for allocating specific time slots among the system controller and the graphics controller, such that overall system performance is further degraded.
In other words, overall performance of the FIG. 2 system is limited by the bandwidth of the single memory block, and the high demands of graphics refresh function alone introduce significant performance degradation. The allocation of memory bandwidth between display access and system access using fixed time-slots only adds to performance degradation. Because the time slots must be capable of handling the worst case requirements for each of the system memory and display memory subsystems, the worst possible memory allocation is forced to be the normal case.
Examples of computers using time-slice access to an integrated memory are the Commodore and the Amiga. The Apple II computer also used a single memory for system and display purposes. In addition, the recently-released Polar.TM. chip set of the present assignee, for portable and laptop systems, makes provision for integrated memory.
A different approach is described in a document entitled "64200 (Wingine.TM.) High Performance `Windows.TM. Engine`", available from Chips and Technologies, Inc. In one respect, Wingine is similar to the conventional computer architecture of FIG. 1 but with the addition of a separate path that enables the system controller to perform write operations to graphics memory. The graphics controller, meanwhile, performs screen refresh only. In another respect, Wingine may be viewed as a variation on previous integrated-memory architectures. Part of system memory is replaced with VRAM, thereby eliminating the bandwidth contention problem using a more expensive memory (VRAM is typically at least twice as expensive as DRAM). In the Wingine implementation, VRAM is not shared but is dedicated for use as graphics memory. Similarly, one version of the Alpha microprocessor sold by Digital Equipment Corporation reportedly has on board a memory controller that allows VRAM to be used to alleviate the bandwidth contention problem. The CPU performs a role analogous to that of a graphics controller, viewing the VRAM frame buffer as a special section of system RAM. As with Wingine, the VRAM is not shared.
Thus, traditional computer architectures can not efficiently integrate a single memory to accommodate the two different functions of display memory and system memory without significantly degrading system performance. What is needed, then, is a new computer architecture that allows display memory and system memory to be integrated while still achieving high system performance. Such an architecture should, desirably, allow for memory expansion and use with cache memory. Further, any such system should provide an upgrade path to existing and planned high performance memory chips, including VRAM, synchronous DRAM (SDRAM) and extended data out DRAM (EDODRAM).