This patent application is a continuation of U.S. patent application Ser. No. 09/206,677, filed Dec. 7, 1998, now U.S. Pat. No. 6,326,973, and entitled xe2x80x9cMethod And System For Allocating AGP/GART Memory From the Local AGP Memory Controller In A Highly Parallel System Architecture (HPSA).xe2x80x9d
1. Field of the Invention
The present invention relates to computer systems using at least one accelerated graphics port (AGP) with at least two core logic chip sets, and more particularly, in allocating AGP/GART memory from the system memory local to the AGP device.
2. Description of the Related Technology
Use of computers, especially personal computers, in business and at home is becoming more and more pervasive because the computer has become an integral tool of most information workers who work in the fields of accounting, law, engineering, insurance, services, sales and the like. Rapid technological improvements in the field of computers have opened up many new applications heretofore unavailable or too expensive for the use of older technology mainframe computers. These personal computers may be stand-alone workstations (high-end individual personal computers), desktop personal computers, portable laptop computers and the like. Moreover, personal computers may be linked together in a network by a xe2x80x9cnetwork serverxe2x80x9d which is also a personal computer which may have a few additional features specific to its purpose in the network. The network server may be used to store massive amounts of data, and may facilitate interaction of the individual workstations connected to the network for electronic mail (xe2x80x9cE-mailxe2x80x9d), document databases, video teleconferencing, white boarding, integrated enterprise calendar, virtual engineering design and the like. Multiple network servers may also be interconnected by local area networks (xe2x80x9cLANxe2x80x9d) and wide area networks (xe2x80x9cWANxe2x80x9d).
A significant part of the ever-increasing popularity of the personal computer, besides its low cost relative to just a few years ago, is its ability to run sophisticated programs and perform many useful and new tasks. Personal computers today may be easily upgraded with new peripheral devices for added flexibility and enhanced performance. A major advance in the performance of personal computers (both workstation and network servers) has been the implementation of sophisticated peripheral devices such as video graphics adapters, local area network interfaces, SCSI bus adapters, full motion video, redundant error checking and correcting disk arrays, and the like. These sophisticated peripheral devices are capable of data transfer rates approaching the native speed of the computer system""s microprocessor central processing unit (xe2x80x9cCPUxe2x80x9d). The peripheral devices"" data transfer speeds are achieved by connecting the peripheral devices to the microprocessor(s) and associated system random access memory through high-speed expansion local buses. Most notably, a high speed expansion local bus standard has emerged that is microprocessor independent and has been embraced by a significant number of peripheral hardware manufacturers and software programmers. This high-speed expansion bus standard is called the xe2x80x9cPeripheral Component Interconnectxe2x80x9d or xe2x80x9cPCI.xe2x80x9d A more complete definition of the PCI local bus may be found in the following specifications: PCI Local Bus Specification, revision 2.1; PCI/PCI Bridge Specification, revision 1.0; PCI System Design Guide, revision 1.0; PCI BIOS Specification, revision 2.1, and Engineering Change Notice (xe2x80x9cECNxe2x80x9d) entitled xe2x80x9cAddition of xe2x80x98New Capabilitiesxe2x80x99 Structure,xe2x80x9d dated May 20, 1996, the disclosures of which are hereby incorporated by reference. These PCI specifications and ECN""s are available from the PCI Special Interest Group, P.O. Box 14070, Portland, Oreg. 97214.
A computer system has a plurality of information (data and address) busses. These busses include a host bus, a memory bus, at least one high-speed expansion local bus such as the PCI bus, and other peripheral buses such as the Small Computer System Interface (SCSI), Extension to Industry Standard Architecture (EISA), and Industry Standard Architecture (ISA). The microprocessor(s) of the computer system communicates with main memory and with the peripherals that make up the computer system over these various buses. The microprocessor(s) communicates to the main memory over a host bus to a memory bus bridge. The peripherals, depending on their data transfer speed requirements, are connected to the various buses which are connected to the microprocessor host bus through bus bridges that detect required actions, arbitrate, and translate both data and addresses between the various buses.
Increasingly sophisticated microprocessors have revolutionized the role of the personal computer by enabling complex applications software to run at mainframe computer speeds. The latest microprocessors have brought the level of technical sophistication to personal computers that, just a few years ago, was available only in mainframe and mini-computer systems. Some representative examples of these new microprocessors are the xe2x80x9cPENTIUMxe2x80x9d and xe2x80x9cPENTIUM PROxe2x80x9d (registered trademarks of Intel Corporation). Advanced Micro Devices, Cyrix, IBM, Digital Equipment Corp., and Motorola also manufacture advanced microprocessors.
These sophisticated microprocessors have, in turn, made possible running complex application programs using advanced three dimensional (xe2x80x9c3-Dxe2x80x9d) graphics for computer aided drafting and manufacturing, engineering simulations, games and the like. Increasingly complex 3-D graphics require higher speed access to ever-larger amounts of graphics data stored in memory. This memory may be part of the video graphics processor system, but, preferably, would be best (lowest cost) if part of the main computer system memory. Intel Corporation has proposed a low cost but improved 3-D graphics standard called the xe2x80x9cAccelerated Graphics Portxe2x80x9d (AGP) initiative. With AGP 3-D, graphics data, in particular textures, may be shifted out of the graphics controller local memory to computer system memory. The computer system memory is lower in cost than the graphics controller local memory and is more easily adapted for a multitude of other uses besides storing graphics data.
The Intel AGP 3-D graphics standard defines a high-speed data pipeline, or xe2x80x9cAGP bus,xe2x80x9d between the graphics controller and system memory. This AGP bus has sufficient bandwidth for the graphics controller to retrieve textures from system memory without materially affecting computer system performance for other non-graphics operations. The Intel 3-D graphics standard is a specification, which provides signal, protocol, electrical, and mechanical specifications for the AGP bus and devices attached thereto. The original specification is entitled xe2x80x9cAccelerated Graphics Port Interface Specification Revision 1.0,xe2x80x9d dated Jul. 31, 1996, the disclosure of which is hereby incorporated by reference. The newest specification is entitled xe2x80x9cPreliminary Draft of Accelerated Graphics Port Interface Specification, Preliminary Draft of Revision 2.0,xe2x80x9d dated Dec. 10, 1997. These AGP Specifications are available from Intel Corporation, Santa Clara, Calif.
The AGP interface specification uses the 66 MHz PCI (Revision 2.1) specification as an operational baseline, with three performance enhancements to the PCI specification which are used to optimize the AGP Specification for high performance 3-D graphics applications. These enhancements are: 1) pipelined memory read and write operations, 2) demultiplexing of address and data on the AGP bus by use of side-band signals, and 3) data transfer rates of 133 MHz for data throughput in excess of 500 megabytes per second (xe2x80x9cMB/sxe2x80x9d). The remaining AGP Specification does not modify the PCI specification, but rather provides a range of graphics-oriented performance enhancements for use by 3-D graphics hardware and software designers. The AGP Specification is neither meant to replace or diminish full use of the PCI standard in the computer system. The AGP Specification creates an independent and additional high speed local bus for use by 3-D graphics devices such as a graphics controller, wherein the other input-output (xe2x80x9cI/Oxe2x80x9d) devices of the computer system may remain on any combination of the PCI, SCSI, EISA and ISA buses.
To functionally enable this AGP 3-D graphics bus, new computer system hardware and software are required. This requires new computer system core logic designed to function as a host bus/memory bus/PCI bus to AGP bus bridge meeting the AGP Specification. Moreover, new Read Only Memory Basic Input Output System (xe2x80x9cROM BIOSxe2x80x9d) and Application Programming Interface (xe2x80x9cAPIxe2x80x9d) software are required to make the AGP dependent hardware functional in the computer system. The computer system core logic must still meet the PCI standards referenced above and facilitate interfacing the PCI bus(es) to the remainder of the computer system. In addition, new AGP compatible device cards must be designed to properly interface, mechanically and electrically, with the AGP bus connector.
AGP and PCI device cards are neither physically nor electrically interchangeable even though there is some commonality of signal functions between the AGP and PCI interface specifications. The present AGP Specification only makes allowance for a single AGP device on an AGP bus, whereas, the PCI specification allows two plug-in slots for PCI devices plus a bridge on a PCI bus running at 66 MHz. The single AGP device is capable of functioning in both a 1xc3x97mode (264 MB/s peak) and a 2xc3x97mode (532 MB/s peak). The AGP bus is defined as a 32-bit bus, and may have up to four bytes of data transferred per clock in the 1xc3x97mode and up to eight bytes of data per clock in the 2xc3x97mode. The PCI bus is defined as either a 32 bit or 64 bit bus, and may have up to four or eight bytes of data transferred per clock, respectively. The AGP bus, however, has additional side-band signals which enables it to transfer blocks of data more efficiently than is possible using a PCI bus. An AGP bus running in the 2xc3x97mode provides sufficient video data throughput (532 MB/s peak) to allow increasingly complex 3-D graphics applications to run on personal computers.
A major performance/cost enhancement using AGP in a computer system is accomplished by shifting texture data structures from local graphics memory to main memory. Textures are ideally suited for this shift for several reasons. Textures are generally read-only, and therefore problems of access ordering and coherency are less likely to occur. Shifting of textures serves to balance the bandwidth load between system memory and the local graphics memory because a well-cached host processor has much lower memory bandwidth requirements than does a 3-D rendering machine. Texture access comprises perhaps the single largest component of rendering memory bandwidth, so avoiding loading or caching textures in local graphics memory saves not only this component of local memory bandwidth, but also the bandwidth necessary to load the texture store in the first place. Furthermore, texture data must pass through main memory anyway as it is loaded from a mass store device. Texture size is dependent upon application quality rather than on display resolution, and therefore may require the greatest increase in memory as software applications become more advanced. Texture data is not persistent and may reside in the computer system memory only for the duration of the software application. Consequently, any system memory spent on texture storage can be returned to the free memory heap when the application concludes (unlike a graphic controller""s local frame buffer, which may remain in persistent use). For these reasons, shifting texture data from local graphics memory to main memory significantly reduces computer system costs when implementing 3-D graphics.
Generally, in computer system memory architecture, the graphics controller""s physical address space resides above the top of system memory. The graphics controller uses this physical address space to access its local memory, which holds information that is required to generate a graphics screen. In the AGP system, information still resides in the graphics controller""s local memory (textures, alpha, z-buffer, etc.), but some data which previously resided in this local memory is moved to system memory (primarily textures, but also command lists, etc.). The address space employed by the graphics controller to access these textures becomes virtual, meaning that the physical memory corresponding to this address space doesn""t actually exist above the top of the memory space. In reality, each of these virtual addresses corresponds to a physical address in system memory. The graphics controller sees this virtual address space, referenced hereinafter as xe2x80x9cAGP device address space,xe2x80x9d as one contiguous block of memory, but the corresponding physical memory addresses may be allocated in 4 kilobyte (xe2x80x9cKBxe2x80x9d), non-contiguous pages throughout the computer system physical memory.
There are two primary AGP usage models for 3D rendering that have to do with how data are partitioned and accessed, and the resultant interface data flow characteristics. In the xe2x80x9cDMAxe2x80x9dmodel, the primary graphics memory is a local memory referred to as xe2x80x98local frame bufferxe2x80x99 and is associated with the AGP graphics controller or xe2x80x9cvideo accelerator.xe2x80x9d 3D structures are stored in system memory, but are not used (or xe2x80x9cexecutedxe2x80x9d) directly from this memory; rather they are copied to primary (local) memory, to which the rendering engine""s address generator (of the AGP graphics controller) makes references thereto. This implies that the traffic on the AGP bus tends to be long, sequential transfers, serving the purpose of bulk data transport from system memory to primary graphics (local) memory. This sort of access model is amenable to a linked list of physical addresses provided by software (similar to operation of a disk or network I/O device), and is generally not sensitive to a non-contiguous view of the memory space.
In the xe2x80x9cexecutexe2x80x9d model, the video accelerator uses both the local memory and the system memory as primary graphics memory. From the accelerator""s perspective, the two memory systems are logically equivalent; any data structure may be allocated in either memory, with performance optimization as the only criteria for selection. In general, structures in system memory space are not copied into the local memory prior to use by the video accelerator, but are xe2x80x9cexecutedxe2x80x9d in place. This implies that the traffic on the AGP bus tends to be short, random accesses, which are not amenable to an access model based on software resolved lists of physical addresses. Since the accelerator generates direct references into system memory, a contiguous view of that space is essential. But, since system memory is dynamically allocated in, for example, random 4,096 byte blocks of the memory, hereinafter 4-kilobyte (xe2x80x9cKBxe2x80x9d) pages, it is necessary in the xe2x80x9cexecutexe2x80x9d model to provide an address mapping mechanism that maps the random 4 KB pages into a single contiguous address space.
The AGP Specification supports both the xe2x80x9cDMAxe2x80x9d and xe2x80x9cexecutexe2x80x9d models. However, since a primary motivation of the AGP is to reduce growth pressure on the graphics controller""s local memory (including local frame buffer memory), the xe2x80x9cexecutexe2x80x9d model is preferred. Consistent with this preference, the AGP Specification requires a virtual-to-physical address re-mapping mechanism, which ensures the graphics accelerator (AGP master) will have a contiguous view of graphics data structures dynamically allocated in the system memory. This address re-mapping applies only to a single, programmable range of the system physical address space and is common to all system agents. Addresses falling in this range are re-mapped to non-contiguous pages of physical system memory. All addresses not in this range are passed through without modification, and map directly to main system memory, or to device specific ranges, such as a PCI device""s physical memory. Re-mapping is accomplished via a xe2x80x9cGraphics Address Remapping Tablexe2x80x9d (xe2x80x9cGARTxe2x80x9d) which is set up and maintained by a GART miniport driver software, and used by the core logic chipset to perform the re-mapping. In order to avoid compatibility issues and allow future implementation flexibility, this mechanism is specified at a software (API) level. In other words, the actual GART format may be abstracted to the API by a hardware abstraction layer (xe2x80x9cHALxe2x80x9d) or mini-port driver that is provided with the core logic chipset. While this API does not constrain the future partitioning of re-mapping hardware, the re-mapping function will typically be implemented in the core logic chipset.
The contiguous AGP graphics controller""s device addresses are mapped (translated) into corresponding physical addresses that reside in the computer system physical memory by using the GART which may also reside in physical memory. The GART is used by the core logic chipset to re-map AGP device addresses that can originate from either the AGP, host, or PCI buses. A software program called a xe2x80x9cGART miniport driverxe2x80x9d manages the GART. The GART miniport driver provides GART services for the computer software operating system.
The size of AGP device address space is always greater than or equal to the amount of physical system memory allocated to AGP. The amount of physical memory allocated to AGP is managed by the computer operating system. AGP device address space is specific to the core logic chipset. In the Compaq/RCC chipset, the AGP device address space may be from a minimum of 32 MB to a maximum of 2 GB, with a default value of, for example, 256 MB. In the Intel 440 BX chipset, the minimum is 4 MB and the maximum is 256 MB. Some AGP graphics controllers do not require the default value of 256 MB of device address space, and may only need the minimum of 32 MB.
AGP memory issues are complicated greatly in computer systems that utilize is multiple central processing units and multiple core logic chipsets to facilitate parallel processing. These memory issues are compounded if there are multiple AGP busses present in the computer system. For example, AGP or GART physical system memory can be allocated to memory residing on the non-AGP memory controller controlled through another chipset. Although these addresses can be passed between controllers, doing so consumes both valuable bandwidth and time.
What is needed is a system and method of dynamically allocating virtual address space for an AGP device in a multi-processor/multi-core logic computer system without having to allocate memory in a portion of the computer system that would minimize communications between different memory controllers on separate core logic units.
The present invention solves the problems inherent in the prior art systems by providing an advanced configuration and power interface (ACPI) control method. ACPI is an operating-system-based specification that defines a flexible and abstract hardware interface for desktop and notebook PCs. As such, the specification enables system designers to integrate power management throughout the hardware components, the operating system, and the software applications. ACPI is utilized in both the Windows NT(trademark) and the Windows 98(trademark) operating systems, available from the Microsoft Corporation of Redmond, Wash. The method of the present invention calls for creating a new ACPI table called the Memory Bank Address Table (MBAT). System BIOS utilizes ACPI Source Language (ASL) to build the MBAT dynamically at startup. The AGP driver and the GART miniport driver access the MBAT via an ACPI control method. The AGP driver and the GART miniport driver uses the MBAT information when making calls to the operating system""s memory manager to allocate new memory. For example, the drivers pass ranges derived from MBAT information to the memory manager when the Windows NT 5.0(trademark) operating system service routine xe2x80x9cMmAllocatePagesForMdl( )xe2x80x9d is called. The memory manager attempts to allocate local memory within the specified range. If local memory cannot be allocated within the required range, the drivers allocate memory from outside the range as a fall back. Once the memory is allocated, the AGP driver and the GART miniport driver access the allocated memory in the normal manner. If local memory is allocated, then the AGP device can utilize this memory without the need of the host bus, thereby eliminating the host bus-related latencies as well as conserving host bus bandwidth.
The MBAT is built before the need for memory allocation arises. In the preferred embodiment of the present invention, the MBAT is built dynamically by the system BIOS as needed. However, if the system configuration never changes, the MBAT may be built once at system startup and simply stored until needed. If the system BIOS does not support the building of the MBAT, the GART miniport driver can build the MBAT. The structure of the MBAT contains the ranges of memory local to the AGP device depending upon the configuration of the computer system found during startup (POST). The actual form of the MBAT is chipset specific because of the use of certain registers. In any case, by the time that the operating system has booted, the MBAT will have been created and the operating system, specifically the memory manager of the operating system, will know of the location of the MBAT. The location of the MBAT can be in system memory, in non-volatile RAM (NVRAM), on a hard disk drive, a network drive that is connected to the computer system via a network interface card, or on any volatile or non-volatile memory that can be connected to the computer system which can contain the contents of the MBAT.
In operation, when the GART miniport driver or the AGP driver need to allocate memory, they reference the MBAT first. In some cases, such as when the MBAT resides on non-local system memory, the AGP driver will receive a copy of the MBAT for future reference. Referencing the copy of the MBAT locally by the AGP driver further reduces demands upon the host bus of the computer system. Once the MBAT has been referenced, either the AGP driver or the GART miniport driver will use the ranges of the system memory that reside on the same core logic chipset as the AGP bus. It is these ranges that are used to allocate memory for the AGP driver or GART miniport driver. If the range of local memory is unavailable (i.e., it has already been allocated for other devices), then the AGP driver or GART miniport driver request memory from any available resources as a fall back.
Other and further features and advantages will be apparent from the following description of presently preferred embodiments of the invention, given for the purpose of disclosure and taken in conjunction with the accompanying drawings.