1. Field of the Invention
The present invention relates, in general, to a method and system to be utilized in data processing systems. In particular, the present invention relates to a method and system to be utilized in, for non-limiting example, data processing systems wherein the Accelerated Graphics Port (AGP) interface standard is utilized.
2. Description of the Related Art
Data processing systems are systems that manipulate, process, and store data and are notorious within the art. Personal computer systems, and their associated subsystems, constitute well known species of data processing systems. Personal computer systems in general and IBM compatible personal computer systems in particular have attained widespread use for providing computer power to many segments of today""s modem society. A personal computer system can usually be defined as a desk top, floor standing, or portable microcomputer that includes a system unit including but not limited to a system processor and associated volatile and non-volatile memory, a display device, a keyboard, one or more diskette drives, one or more fixed disk storage devices, and one or more data buses for communications between devices. One of the distinguishing characteristics of these systems is the use of a system board to electrically connect these components together. These personal computer systems are information handling systems which are designed primarily to give independent computing power to a single user (or a relatively small group of users in the case of personal computers which serve as computer server systems) and are inexpensively priced for purchase by individuals or small businesses.
A computer system or data-processing system typically includes a system bus. Attached to the system bus are various devices that may communicate locally with each other over the system bus. For example, a typical computer system includes a system bus to which a central processing unit (CPU) is attached and over which the CPU communicates directly with a system memory that is also attached to the system bus.
In addition, the computer system may include a peripheral bus for connecting certain highly integrated peripheral components to the CPU. One such peripheral bus is known as the Peripheral Component Interconnect (PCI) bus. Under the PCI bus standard, peripheral components can directly connect to a PCI bus without the need for glue logic. Thus, PCI is designed to provide a bus standard on which high-performance peripheral devices, such as graphics devices and hard disk drives, can be coupled to the CPU, thereby permitting these high-performance peripheral devices to avoid the general access latency and the band-width constraints that would have occurred if these peripheral devices: were connected to a low speed peripheral bus. Details on the PCI local bus standard can be obtained under the PCI Bus Specification, Revision 2.1, from the PCI Special Interest Group, which is hereby incorporated by reference in its entirety.
Relatively recently, techniques for rendering three-dimensional (3D) continuous-animation graphics have been implemented within PCs which, as will be explained below, have exposed limitations in the originally high performance of the PCI bus. The AGP interface standard has been developed to both (1) reduce the load on the PCI bus systems, and (2) extend the capabilities of systems to include the ability to provide 3D continuous-animation graphics with a level of quality previously found only on high-end computer workstations. The AGP interface standard is defined by the following document: Intel Corporation, Accelerated Graphics Port Interface Specification, Revision 1.0 (Jul. 31, 1996), which is hereby incorporated by reference in its entirety.
The AGP interface standard is specifically targeted to improve the efficiency of 3D continuous-animation graphics applications which utilize a technique know in the art as xe2x80x9ctexturing.xe2x80x9d Consequently, as background for understanding the data processing systems utilizing the AGP interface standard, it is helpful to have a brief overview of the data processing needs of 3D continuous animation graphics applications which utilize texturing, how they degrade the performance of PCI local bus systems, and how the AGP interface standard remedy this degradation of performance.
The display device of a computing system displays data in two-dimensions (2D). In order to create a 3D continuous animation graphical display, it is first necessary to create an object such that when the object is presented on the 2D display device, the object will be perceived by a human viewer as a 3D object. There are two basic ways in which this can be done. The first way is to use color and shading techniques to trick the human visual system into perceiving 3D objects on the 2D display device (essentially the same technique used by human artists when creating what appear to be 3D landscapes consisting of trees, rocks, streams, etc., on 2D canvases). This is a very powerful technique and creates superior 3D realism. The second way is to use mutually perpendicular lines (e.g., the well-known x, y, z coordinate system) to create geometric objects which will be interpreted by the human visual system as denoting 3D (essentially the same technique used by human architects to create the illusion of 3D in perspective view architectural drawings). However, the 3D illusion created by the use of mutually perpendicular lines is generally perceived to be inferior to that produced by the coloring and shading techniques.
Subsequent to creating a 3D object, the object must be animated. Animation is the creation of the illusion of continuous motion by the rapid sequential presentation of discrete images, or frames, upon the 2D display device. Animated 3D computer graphics are generated by taking advantage of a well know physiological property of the human visual system which is that if a person is shown a sequence of 15 discrete snapshots of a continuous motion, where each snapshot was taken in {fraction (1/15)} second intervals, within one second, the brain will integrate the sequence together such that the person will xe2x80x9csee,xe2x80x9d or perceive, continuous motion. However, due to person-to-person variations in physiology, it has been found empirically that a presentation of 20 images per second is generally the minimum rate at which the majority of people will perceive continuous motion without flicker, with 30 images per second tending to be the accepted as the optimal presentation speed.
The difficulty with 3D continuous animation computer graphics is that while the color and shading techniques (which are typically accomplished via bit-mapped images) produce superior 3D realism, such techniques are not easy for a computer to translate through geometric space for the creation of continuously varying sequential images necessary to produce the animation effect. On the other hand, the geometric shapes produced via the use of mutually perpendicular lines allow for easy computer manipulation in three dimensions, which allows the creation of sequential images necessary to produce the animation effect, but such geometric shapes result in inferior 3D realism. Recent 3D continuous-animation computer graphics techniques take advantage of both of the foregoing noted 3D techniques via the use of a middle ground approach known in the art xe2x80x9ctexturing.xe2x80x9d
In the use of texturing, the gross, overall structures of an object are denoted by a 3D geometric shape which is used to do geometric translation in three space, while the finer details of each side of the 3D object are denoted by bit mapped images (known in the art as xe2x80x9ctexturesxe2x80x9d) which accomplish the color and shading techniques. Each time a new image of an object is needed for animation, the geometric representation is pulled from computer memory into a CPU, and the appropriate translations calculated. Thereafter, the translated geometric representation is cached and the appropriate bit-mapped images are pulled from computer memory into the CPU and transformed as appropriate to the new geometric translations so as to give the correct appearance from the viewpoint of the display device, the new geometric position, and any lighting sources and/or other objects that may be present within the image to be presented. Thereafter, a device known as the graphics controller, which is responsible for creating and presenting frames (one complete computer screen) of data, retrieves both the translated geometric object data and transformed texture data, xe2x80x9cpaintsxe2x80x9d the surfaces of the geometric object with the texture data, and places the resultant object into frame buffer memory (a storage device local to the graphics controller wherein each individual frame is built before it is sent to the 2D display device). It is to be understood that the foregoing noted series of translations/transformations is done for each animated object to be displayed.
It is primarily the technique of texturing which has exposed the performance limitations of PCI bus systems. It has been found that when an attempt is made to implement 3D continuous-animation computer graphics application wherein texturing is utilized within PCI bus systems, the texturing data results in effective monopolization of the PCI bus by the application, unless expensive memory is added to the graphics controller. That is, texturing using the PCI bus is possible. However, due to PCI bandwidth limitations, the textures must fit into the memory directly connected to the graphics card. Since there is a direct correlation between the size of textures and the realism of the scene, quality can only be achieved by adding memory to the graphics card/controller. It was this realization that prompted the development of the AGP interface specification: with the AGP interface standard, texture size can be increased using available system memory. The AGP interface standard is intended to remedy the exposed limitations of the PCI local bus systems by providing extended capabilities to PCI bus systems for performing 3D continuous-animation computer graphics, as will become clear in the following detailed description.
The AGP interface standard accomplishes the foregoing via a rather indirect process. Under the AGP interface standard, a CPU independently processes the geometric and texturing data associated with each object to be displayed in a scene. Subsequent to processing the geometric and texturing data, the CPU writes the geometric and texturing data back into system memory. Thereafter, the CPU informs a graphics processor that the information is ready, and the graphics processor retrieves the information from the system memory.
As can be seen from the foregoing, most of the traffic on the AGP bus is actually generated by the graphics controller. That is, under the dictates of the AGP interface standard, the graphics controller is typically either reading data from system memory or writing data to system memory. One of the main thrusts of the AGP interface standard is to create an AGP bus (alternatively referred to as the AGP Interconnect) which substantially optimizes transactions generated by the graphics controller, such that high rates of data throughput are provided.
One of the features supported by the AGP interface specification is that the graphics controller, when it is reading data from memory, be able to issue what are known as xe2x80x9cpipeline cycles.xe2x80x9d By xe2x80x9cpipeline cyclesxe2x80x9d what is meant is that the graphics controller can issue a first memory access request, and before that first request is completed can issue a second memory access request, and before either the first or second request is completed can issue a third memory access request, etc. The AGP interface standard itself does not limit the number of pipelined requests that can be issued before completion. It is actually up to the hardware, for example an AGP graphics controller communicating with an AGP-enabled Northbridge, to negotiate what is the acceptable depth (i.e., how many transactions can be issued outstanding) of the pipeline.
It is significant that while the AGP specification does allow pipelining of memory accesses, it does not allow the transactions to be completed out of order. In other words, if a graphics controller issues four memory transactions A, B, C, and D, under the AGP interface standard the graphics controller can issue all of them without any one of them being completed. However, also under the dictates of the AGP interface specification, the requested memory transactions must complete in the same order as they were issued. This requirement of in-order completion gives rise to at east two inefficiencies: an increase in data latency and a requirement that the memory controller, located in the AGP-enabled Northbridge, work extra hard to keep things in order as required by the AGP interface standard.
As an aid to understanding these noted inefficiencies, it is helpful to have some background on the way in which memory access typically works. Typically, a computer system memory is a collection of Direct Random Access Memory units (DRAMs). The computer system memory, composed of DRAMs, can store data, but there is typically no intelligence in the system memory. The intelligence concerning how data is going to be stored, where the data is going to be stored, how the data is going to be read or written, etc., is contained within a module known within the art as a xe2x80x9cmemory controllerxe2x80x9d which may be contained within some other system component, typically a Northbridge.
The memory controller controls access to system memory, which as has been noted is typically composed of DRAMs. A DRAM can be thought of as a collection of cells, or storage locations, wherein data is stored. For simplicity it will be assumed here that each cell stores a byte, but those skilled in the art will recognize that other storage sizes are possible.
When a memory access, such as a read cycle, is engaged in, the memory controller is given an address by another device, such as a graphics controller. That address needs to correctly specify one of the cells where data is actually stored. Ordinarily, cells within DRAMs are arranged in row and column format (i.e., the cells are arranged like a matrix).
Consequently, an address, which for sake of illustration will be assumed to be 16 bits long, customarily is conceived of as being composed of two parts: a first 8-bit portion of the address which is associated with a row address, and a second 8-bit portion which is associated with a column address (again, the bit lengths are hypothetical and merely utilized here for illustrative purposes). This fragmentation of the address into row and column portions allows the address to correctly specify a storage location, or cell, by its row and column.
Conventionally, a DRAM has at least two buses, or at least hypothetically what can be treated as two buses: a data bus, and an address bus. To minimize DRAM hardware, it is customary that the address bus be only eight bits wide, in order to minimize the number of pins on the DRAM, which those skilled in the art will recognize is a major constraint or limiting factor on how small one can make a DRAM chip. Due to this limitation on the width of the address bus, memory access is typically achieved by first placing the row portion of the address on the address bus, which will select the appropriate row, and second, a short time later, placing the column portion of the address on the address bus, which will select the appropriate column. This then correctly specifies the row and column location of the storage location that is desired. At some time after the row and column information have both been specified, the data from the memory location specified by the row and column address appears on the data bus.
From the foregoing, it can be seen that in order to make a single memory access there are three phases: a row address phase, a column address phase, and a data retrieval phase. In the past, it was noticed that typical programs tend to operate sequentially, so if there is a memory address accessed, it is likely that the next memory address accessed will be the very next cell, which means that the column address is likely to change, while the row address is not likely to change. Consequently, typical DRAMs are structured such that once the row address has been driven, thereafter the DRAM responds to new addresses on the address bus as if those addresses are column indicators, and thus will use such addresses as column addresses within a current row until the DRAM, is notified that a new row address will be appearing on the address bus. DRAM devices using this scheme (driving the row once and then operating upon columns with the row) are known in the art as xe2x80x9cpage modexe2x80x9d DRAMs.
In light of the foregoing, in the event that a memory controller has several memory accesses to be done sequentially, then once a page is open it makes sense from an efficiency standpoint to examine pending as well as current memory accesses in order to determine which of those pending memory accesses will be to memory locations that are within a currently open page (that is, the row of the request is the row from which a memory controller is currently reading within a DRAM). In other words, assuming a page X is open; if there are four memory accesses A, B, C, and D, waiting to be performed, and assuming the first access A is to page Z, the second access B is to page X, the third access C is to page Y, and the fourth access D is to page W, it is preferable from a memory efficiency standpoint that the data access (i.e., access B) appropriate to the page that is open (X) be made first.
Current memory controllers already xe2x80x9clook aheadxe2x80x9d to see if pending memory accesses are destined for currently open pages. Furthermore, at any given time, typically more than one page of memory is generally open. For example under the Direct RDRAM scheme (expected to be available in the near future), it is expected that up to 8 pages per RDRAM chip will be open simultaneously. Thus, if a system has eight RDRAM chips (a reasonable assumption), it will be possible to have up to 64 pages open simultaneously. Thus, when multiple memory accesses are to be sequentially executed, an efficient strategy which may be employed by the memory controller is that it selects which ones of the memory accesses to be executed are intended for pages which are already open, completes those accesses first, and subsequently proceeds with the memory accesses which will require opening new pages. This greatly increases memory efficiency
However, it is noteworthy that the effect of the forgoing alteration of the order in which memory accesses are executed, from the standpoint of a device interacting with such a memory controller, is that the memory controller will be seen to be executing accesses in an order different than that from in which they were received by the memory controller. That is, the memory controller has re-ordered the accesses inside the memory controller, and will thus respond to such requests for access xe2x80x9cout of order.xe2x80x9d In the current AGP interface standard the AGP Interconnect (or AGP bus) does not allow or provide for such reordering of memory accesses. This means that even if a memory controller is capable of taking advantage of open pages (which is likely to be the case in most modem systems), because of the limitations of the AGP Interconnect under the AGP interface standard, the memory controller will not be allowed to do the reordering. Furthermore, since a significant percentage of existing memory controllers already take advantage of open pages, what this AGP Interconnect requirement will often actually mean is that additional hardware will need to be added to extant memory controllers such that the memory accesses are returned on the AGP Interconnect in the order in which they were received.
Various users, such as graphics vendors, within the art would like to have the ability to utilize the ability of memory controllers to complete requests out of order. Such out of order completion is particularly attractive, especially in high-bandwidth graphics processing environments, because out of order access reduces the average data latency; that is, some of the accesses can be completed quicker, which means that the overall time spent on completing a series of memory accesses will be smaller, which means that an increase in the efficiency of the memory subsystem will be achieved.
However, as noted, under the AGP interface specification, such out of order completions are not provided for or allowed. It is undeniable that the AGP interface standard is highly useful and that AGP compliant devices are highly desirable. However, it is likewise clear that inefficiencies exist and arise from the AGP standard restricting the acceptable manner of completion of memory access requests to be in order completion. It is therefore apparent that a need exists in the art for a method and system which will substantially conform to the established AGP interface standards, yet also allow memory accesses to be completed by the memory controller in an order different from that it which they were received.
It has been discovered that a method and system can be produced which will substantially conform to the established AGP interface standards, yet also allow memory accesses to be completed by the memory controller in an order different from that in which they were received. The method and system especially allow for improving memory access in Accelerated Graphics Port systems, but the method and system are not limited to Accelerated Graphics Port systems. The method and system associate a transaction id with individual data transactions within a number of Accelerated Graphics Port (AGP) pipelined data transactions, and identify the individual data transactions within the number of AGP pipelined data transactions via the transaction id. In one instance, the association of a transaction id with individual data transactions includes but is not limited to associating a transaction id with each individual memory read request within a number of AGP pipelined memory read requests and associating an identical transaction id with each individual data unit, within a number of pipelined data units, corresponding to each individual memory read request within the number of AGP pipelined memory requests. In another instance, the association of a transaction id with individual memory read requests within a number of AGP pipelined memory read requests includes but is not limited to placing a transaction id on a Side Band Addressing bus substantially immediately after placing a read request on the same Side Band Addressing bus, and the association of an identical transaction id with individual data units within a number of the data units associated with pipelined data units corresponding to each of the AGP pipelined memory read requests includes but is not limited to placing a transaction id on a ST[2::0] bus while substantially simultaneously placing a data unit on an AGP Interconnect.
The foregoing summary is illustrative and is intended to be in no way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.