The present invention relates generally to memory systems and, more particularly, a distributed translation look-aside buffers for a Graphics Address Remapping Table (GART).
Modern computer graphics applications require high-speed processing in order to generate realistic images on a display device (e.g., a computer monitor). Within a computer, the requisite processing power for modern graphics applications is provided by a host processor and a graphics controller. Large blocks of data and other information must travel to, from, and between the host processor and the graphics controller during operation.
With the accelerated graphics port (AGP) architecture, data used by both the graphics controller and the host processor can be stored in system (host) memory. The AGP architecture provides a dedicated, high speed port through which data can be moved between the graphics controller and system memory. The AGP architecture utilizes host paging. As such, blocks of memory with contiguous linear addresses may not be physically contiguous in system memory. Specifically, each linear address corresponds to some location in a xe2x80x9cvirtualxe2x80x9d memory. In the virtual memory, data for certain structures (e.g., texture maps) are stored in contiguous locations. In the physical system memory, however, the data may actually be stored in non-contiguous locations.
Because the host processor and the graphics controller must see data structures as contiguous blocks, the AGP architecture is equipped with core logic to translate the virtual linear addresses into corresponding physical addresses. This translation is accomplished with a memory-based graphics address remapping table (GART). The GART supports a mapping function between virtual addresses and physical addresses. With this mapping in the AGP architecture, a processing device (e.g., the host controller or the graphics controller) may use a translation look-aside buffer for performing memory accesses. In general, the translation look-aside buffer functions to temporarily store data and information for performing translations.
With previously developed techniques, a single translation look-aside buffer is provided to support all processing devices. The processing devices share use of the translation look-aside buffer. With a single, shared translation look-aside buffer, contention arises between the processing devices for its use. For example, one processing device may direct that certain data be stored into the buffer for a desired translation, but before the translation has been completed, another processing device may direct that other data be stored into the buffer. This other data overwrites the previously stored data. Thus, in order to complete the translation desired by the first processing device, the first data must be re-written into the translation look-aside buffer. Accordingly, the contention between processing devices diminishes performance.
Also, with previously developed techniques, multiple interconnections are required to support all of the processing devices sharing a single translation look-aside buffer. Because each of these interconnections must run from an interface device associated with a respective processing device to the translation look-aside buffer, the interconnections may be relatively long. A longer length connection increases the delay for any signals traveling thereon, and thus makes it more difficult to achieve design time requirements.
In an AGP architecture utilizing a memory-based GART, the translation look-aside buffer is initially searched for information which can be used for translation. If the desired information is not found within the translation look-aside buffer, a xe2x80x9cmissxe2x80x9d occurs and the information must be retrieved from main memory. With previously developed techniques utilizing a single, shared translation look-aside buffer for multiple processing devices, if a miss occurs because of a search request by one processing device, any search request by another processing device is delayed while action is taken in response to the miss. Taken collectively across all processing devices, this increases the amount of time required for translation, and thus further reduces performance.
The disadvantage and problems associated with previously developed techniques have been substantially reduced or eliminated with the present invention.
In accordance with one embodiment of the present invention, a system includes a main memory device which stores information for translating a virtual address into a physical address in response to one of a plurality of processing devices. A memory control/interface device is coupled to the main memory device. The memory control/interface device, which may access the information stored in the main memory device, has a separate translation look-aside buffer for each processing device. Each translation look-aside buffer can buffer the information for use in translating in response to the respective processing device.
In accordance with another embodiment of the present invention, a memory control/interface device includes a plurality of translation look-aside buffers each associated with a separate processing device. Each translation look-aside buffer can buffer information for use in translating a linear address received from the respective processing device. A GART walk device is coupled to the plurality of translation look-aside buffers. The GART walk device can execute a table walk process to retrieve the information from a main memory device for buffering in translation look-aside buffers.
A technical advantage of the present invention includes providing a separate translation look-aside buffer for each processing device in an accelerated graphics port (AGP) architecture utilizing a Graphics Aperture Remapping Table (GART). With this arrangement, there is no contention for use of the same buffer storage space by the various processing devices. Also, the physical implementation of each translation look-aside buffer can be localized for the respective processing device, thereby eliminating the relatively long interconnections which would otherwise be needed to connect multiple processing devices to a single, shared translation look-aside buffer. This makes it easier to achieve design timing requirements. Furthermore, a better degree of concurrency is achieved when several processing devices simultaneously issue translation requests. More specifically, any xe2x80x9cmissxe2x80x9d which occurs because of a search request by one processing device in its respective translation look-aside buffer is masked from the other processing devices. The other processing devices are thus still able to search their own respective translation look-aside buffers. Because search requests by several processing devices can proceed simultaneously, the overall operation of the system is enhanced.
Other important technical advantages of the present invention are readily apparent to one skilled in the art from the following figures, descriptions, and claims.