1. Field of the Invention
The present invention pertains to a crossbar switch, and, more particularly, a self-optimizing crossbar switch capable of selecting and distributing multiple concurrent memory requests to a shared memory system such that the memory access are optimized and the selection of requests are optimized.
2. Description of the Related Art
The evolution of electronic computing systems has included the development of more sophisticated techniques for utilizing their computing resources. Consider, for example, a shared memory. A shared memory may be read from and written to by more than one device, e.g., several processors. The devices perform their assigned functions, reading from and writing to the shared memory. The devices request access to the shared memory through a memory controller that controls the operation of the shared memory. Typically, several devices are trying to access the shared memory in this fashion at any given time. However, for a variety of reasons, the devices generally are permitted to access the shared memory only one at a time. The memory controller, or some electronic circuitry associated with the memory controller, must select one of the access requests to process at any given time.
Consider, for instance, a graphics processing system. One memory intensive operation associated with graphics processing is “rendering.” “Rendering” is the process by which a graphics system adds realism to video data by adding three-dimensional qualities such as shadows and variations in color and shade. Because of the high rate at which the graphics data is processed, a rendering machine will typically include multiple “rendering pipelines” operating in parallel. A rendering machine may also employ multiple physical memory devices, each with its own controller, to implement a “frame buffer pixel memory,” or “frame buffer,” in conjunction with the rendering pipelines.
Management of this memory is important to the overall performance of the graphics processing system. One way to manage the memory is to restrict each rendering pipeline to a certain subset of the graphics data to process and a certain portion of the frame buffer. The assigned portion of the frame buffer is accessible through an assigned memory controller. However, higher performance can be obtained if the rendering pipelines are not restricted in this manner, i.e., if they can work on any part of the graphics data stored in any part of the frame buffer. Lifting this restriction, however, includes instituting measures for proper management of the access to the memory. As each rendering pipeline begins issuing requests to access the various portions of the memory, it will at some point try to access a portion that another rendering pipeline wishes to access at the same time. Since access can be granted to only one rendering pipeline at a time, they compete for the access and one or the other is selected.
Several techniques are conventionally employed for deciding the order in which simultaneously pending access requests are processed. One conventional technique is a “round robin” method, wherein access requests are handled in some round robin order, depending on the hardware involved. Another conventional technique processes access requests in order of an assigned priority. Still other conventional techniques process access requests in random order, or on a first-come, first-served basis.
Each of these conventional techniques is built around and implements a rigid set of ordering rules that are predefined and then rigorously implemented. The wooden, mechanical application of the ordering rules inherent in these conventional techniques frequently adversely impacts performance. More particularly, the order in which access requests are processed can significantly impact the bandwidth of the information processed responsive to the access requests.
For instance, the internal design of the dynamic random access memory (“DRAM”) devices from which shared memories are typically constructed favor accesses to data in the same “page.” A page is a block of data that the internal DRAM control logic operates on for each access. Internal DRAM data is organized as pages, so that successive accesses to data bits that are in the same page are faster than successive accesses to data bits that are not in the same page. Because of this characteristic of DRAMs, it is more efficient to select memory requests that access data bits in the same DRAM page. Higher memory bandwidth can be achieved if successive memory requests are all accessing the same page of data. Thus, increased performance can be realized by ordering accesses to maximize the number of successive accesses to the same page(s).
Similarly, the total request throughput rate may be impacted by the selection order. It is common for requesting ports to have first-in, first-out (“FIFO”) queues that buffer memory requests and FIFOs that buffer the memory data returned by read memory requests. As long as these FIFOs are not filled, additional request may be generated and new memory read data returned. If a request FIFO is filled, then the corresponding port must stop and wait until the FIFO has room again. Thus, the request throughput rate will be lower. Likewise, if the memory read data FIFO is filled, then the memory controller must stop and wait until there is room in the FIFO. Again, the request throughput rate suffers. Because of the finite capacity of FIFOs used to store requests and memory read data, it is more efficient to select requests such that the FIFOs will not be filled. By avoiding the full condition, requests may be continually processed with no interruption. Thus, a higher request throughput rate is achieved.
To maximize efficiency and throughput rate under these types of constraints, arbitration and select logic used to decide the selection order should dynamically consider these types of factors. During each operational cycle, the requests should be examined for impact on performance and the more favorable request selected. It is also desirable to adjust the importance of priority of each of these constraints. This allows the various constraints to be weighed differently in making the selection.
However, conventional arbitration and select techniques consider none of these factors in a dynamic fashion. If they are considered at all, they are considered only in a mechanical fashion. Predetermined rules are woodenly applied. If a technique considers, for instance, two successive requesting access to the same page, whether a third request resides in a full FIFO is considered in the same fashion every time. Thus, although the shared memory might appreciate higher utilization, its performance is typically less than what it could be.