1. Field of the Invention
The present invention relates generally to computer architecture, and more particularly, to memory-sharing architectures which include graphics capabilities.
2. State of the Art
As the density of solid state memories increases, oversized memories are being wastefully used for purposes which optimally require specialized memory configurations (e.g., a graphics refresh). One reason for this is that manufacturers attempt to produce memory sizes which will achieve a broad range of applicability and a high volume of production. The more popular, and thus more cost-effective memories, tend to be fabricated with square aspect ratios or with tall, thin aspect ratios (i.e., a large number of fixed length words) that are not readily suited to specialized uses.
Although uses which can exploit memories with these popular aspect ratios can be implemented in a relatively cost-effective manner, specialized uses which cannot exploit these aspect ratios can be proportionately more expensive to implement. The expense associated with implementing specialized uses assumes one of two forms: (1) the increased cost associated with purchasing a memory which does not conform to a readily available and widely used memory configuration; or (2) the increased cost associated with purchasing a readily available memory which is much larger than needed to implement a specialized use (e.g., a relatively square memory which must be tall enough to obtain a desired width, even though only a relatively small number of rows in the memory are needed for the purpose at hand).
The foregoing memory capacity problem is typically referred to as the memory granularity problem: expensive chips can be purchased and used efficiently or inexpensive memory chips can be purchased and used inefficiently. This problem is especially significant in computer systems which implement graphics functions, since these systems typically include a dedicated, high speed display memory. Specialized display memories are usually required because typically refresh for the graphics display (e.g., for a 1280xc3x971024 display) consumes virtually all of the available bandwidth of a typical dynamic random access memory (DRAM).
To update a video line on a high resolution graphics display, a graphics refresh optimally requires a memory having a short, wide aspect ratio. Display memories used as frame buffers for high resolution graphics displays have therefore become an increasingly larger fraction of a system""s overall cost due to the foregoing memory problem. For display memories, even a two megabyte memory can be unnecessarily large, such that it cannot be effectively used. An exemplary display memory for a current high-end display of 1280xc3x971024 pixels requires just over one megabyte of memory. Thus, almost one-half of the display memory remains unused.
For example, FIG. 1 illustrates a typical computer system 100 which includes graphics capabilities. The FIG. 1 computer system includes a central processing unit (CPU) 102, a graphics controller 104 and a system controller 106 all connected to a common bus 108 having a data portion 110 and an address portion 112.
The graphics controller 104 is connected to display memory 114 (e.g., random access memory, or RAM) by a memory bus having a memory address bus 116 and a memory data bus 118. A random access memory digital-to-analog converter (RAMDAC) 120 provides signals (e.g., analog RGB color signals) used to drive a graphics display.
The system controller is connected to system memory 122 by a separate memory address bus 124. A memory data bus 126 is connected directly between the common data bus 108 and the system memory. The system memory can also include a separate cache memory 128 connected to the common bus to provide a relatively high-speed portion for the system memory.
The graphics controller 104 mediates access of the CPU 102 to the display memory 114. For system memory transfers not involving direct memory access (DMA), the system controller 106 mediates access of the CPU 102 to system memory 122, and can include a cache controller for mediating CPU access to the cache memory 128.
However, the FIG. 1 configuration suffers significant drawbacks, including the granularity problem discussed above. The display memory 114 is limited to use in connection with the graphics controller and cannot be used for general system needs. Further, because separate memories are used for the main system and for the graphics memory, a higher number of pin counts render integration of the FIG. 1 computer system difficult. The use of separate controllers and memories for the main system and the graphics also results in significant duplication of bus interfaces, memory control and so forth, thus leading to increased cost. For example, the maximum memory required to handle worst case requirements for each of the system memory and the graphics memory must be separately satisfied, even though the computer system will likely never run an application that would require the maximum amount of graphics memory and main store memory simultaneously. In addition, transfers between the main memory and the graphics require that either the CPU or a DMA controller intervene, thus blocking use of the system bus.
Attempts have been made to alleviate the foregoing drawbacks of the FIG. 1 system by integrating system memory with display memory. However, these attempts have reduced duplication of control features at the expense of system performance. These attempts have not adequately addressed the granularity problem.
Some attempts have been made, particularly in the area of portable and laptop systems, to unify display memory and system memory. For example, one approach to integrated display memory and system memory is illustrated in FIG. 2. However, approaches such as that illustrated in FIG. 2 suffer significant drawbacks. For example, refreshing of the display via the graphics controller requires that cycles be stolen from the main memory, rendering performance unpredictable. Further, these approaches use a time-sliced arbitration mode for allocating specific time slots among the system controller and the graphics controller, such that overall system performance is further degraded.
In other words, overall performance of the FIG. 2 system is limited by the bandwidth of the single memory block, and the high demands of graphics refresh function alone introduce significant performance degradation. The allocation of memory bandwidth between display access and system access using fixed time-slots only adds to performance degradation. Because the time slots must be capable of handling the worst case requirements for each of the system memory and display memory subsystems, the worst possible memory allocation is forced to be the normal case.
Examples of computers using time-slice access to an integrated memory are the Commodore and the Amiga. The Apple II computer also used a single memory for system and display purposes. In addition, the recently-released Polar(trademark) chip set of the present assignee, for portable and laptop systems, makes provision for integrated memory.
A different approach is described in a document entitled xe2x80x9c64200 (Wingine(trademark)) High Performance xe2x80x98Windows(trademark) Enginexe2x80x99xe2x80x9d, available from Chips and Technologies, Inc. In one respect, Wingine is similar to the conventional computer architecture of FIG. 1 but with the addition of a separate path that enables the system controller to perform write operations to graphics memory. The graphics controller, meanwhile, performs screen refresh only. In another respect, Wingine may be viewed as a variation on previous integrated-memory architectures. Part of system memory is replaced with VRAM, thereby eliminating the bandwidth contention problem using a more expensive memory (VRAM is typically at least twice as expensive as DRAM). In the Wingine implementation, VRAM is not shared but is dedicated for use as graphics memory. Similarly, one version of an Alpha microprocessor available from Digital Equipment Corporation is believed to include on board a memory controller that allows VRAM to be used to alleviate the bandwidth contention problem. The CPU performs a role analogous to that of a graphics controller, viewing the VRAM frame buffer as a special section of system RAM. As with Wingine, the VRAM is not shared.
Thus, traditional computer architectures, even those with integrated memories, cannot efficiently share a single memory to accommodate the two different functions of display memory and system memory without significantly degrading system performance. What is needed, then, is a new computer architecture that allows display memory and system memory to be shared while still achieving high system performance. Such an architecture should, desirably, allow for memory expansion and use with cache memory. Further, any such system should provide an upgrade path to existing and planned high performance memory chips, including VRAM, synchronous DRAM (SDRAM) and extended data out DRAM (EDODRAM).
The present invention provides a low-cost computer system which includes a single shared memory that can be independently accessible as graphics memory or main store system memory without performance degradation. Because the xe2x80x9cappetitexe2x80x9d for main system memory (unlike that of a display memory) is difficult to satisfy, the memory granularity problem can be addressed by programmably reallocating an unused portion of a display memory for system memory use. Reallocation of the unused display memory alleviates any need to oversize the display memory, yet realizes the cost effectiveness of using readily available memory sizes. Further, reallocation of the graphics memory avoids any need to separately consider both the system memory and the display memory in accommodating worst case operational requirements.
In exemplary embodiments, performance penalties can be minimized by dynamically allocating the memory bandwidth between concurrent graphics and system memory operations on demand, thereby avoiding use of fixed time slices. By eliminating use of fixed time slices to arbitrate between display memory and system memory accesses, graphics refresh functions can be accommodated with little or no effect on system memory demands. Exemplary embodiments achieve concurrent graphics and system operations by using a memory controller for controlling access to the shared memory, and an arbiter for arbitrating among requests for access to the memory.
In accordance with exemplary embodiments, configuration registers can programmably configure the concurrently accessed memory such that a first portion of the memory is allocated as display memory and a second portion of the memory is allocated as main memory. Control circuitry connected to the configuration registers and responsive to one or more signals applied to the apparatus, including address, data and control signals, can be used to direct at least some of the data signals to only one or the other of first and second data paths. A first data path is connected to the arbiter and includes a first buffer store for facilitating exchange of data with the shared memory, and a second data path is connected to the arbiter and includes a second buffer store for facilitating exchange of data with the shared memory.
In accordance with further embodiments, separate buffer stores, or queues, can be provided to enhance graphics and system accesses achieving improved latency times for both graphics and system cycles. The queues are serviced in parallel and independently of each other.
In accordance with yet additional embodiments of the present invention, improved efficiency of operation can be achieved to enhance concurrency between plural banks of memory when expansion memory is included in a system. As expansion memory is added, it can be mapped to the bottom of the available system address space, and any addressable locations of prior base system memory included in the shared memory are moved above the expansion memory space. Thus, a system controller will use addressable locations of the expansion memory first, and use the base system memory only when the expansion memory is full.