1. Technical Field
The present invention is directed generally toward improved access to a buffer memory. More particularly, the present invention relates to a method and apparatus for a configurable buffer arbiter.
2. Description of the Related Art
Most data controllers utilize external memory for temporary storage of data. For hard disk controllers in disk drive applications, this external memory is most likely Dynamic Random Access Memory (DRAM) due to the inexpensive nature of this type of memory. More particularly, Synchronous DRAM (SDRAM) is a type of dynamic RAM memory chip that is based on standard dynamic RAM chips, but has sophisticated features that makes it considerably faster. Still more particularly, Single Data Rate SDRAM (SDR SDRAM) is a flavor of DRAM that has a synchronous interface and Double Data Rate SDRAM (DDR SDRAM) is a flavor of SDRAM that transfers data on both clock edges of its interface.
DRAM memories have a row/column grid topology. Within the DRAM memory, a bank, row, and column number are used to specify a specific N-bit wide data element. Writing and reading is accomplished by specifying a bank, row, and column at which to start, bursting data to consecutive columns from the starting point until the desired data for this row is transferred, and then closing the row and preparing for the next operation.
In order to write data to or read data from a DRAM memory, a certain protocol must be employed. This protocol adds overhead and reduces the overall time available for data transfers.
The important commands of this protocol in the context of this invention are the following:                ACTIVE—This is a command sent to the memory to indicate that the controller wants to read or write a particular row in a particular memory bank. The specified row in the specified bank is opened so any column within this row can now be accessed.        READ/WRITE—These commands inform the memory whether the controller wants to read or write and the column within the row that this operation will start. Since the DRAM is burst oriented, it may be necessary for the controller to issue multiple read/write commands to the DRAM to keep the burst going.        PRECHARGE—After the read/write operation is complete, it is necessary to close the row before another row in the same bank can be opened. This is done with the precharge command.        
In the description above, a bank is a “page” of memory locations. Single Data Rate (SDR SDRAM) memories can have either two or four banks, whereas Double Data Rate (DDR SDRAM) memories all have four banks. By having more than one bank, memory utilization can be increased since the active and precharge commands can be “hidden.” This means that the precharge of a first bank that has recently been active or the active of a third bank that will next be active can take place while the memory is transferring data to/from a second bank. This method for hiding the overhead is called bank interleaving. Bank interleaving is not guaranteed to reduce overhead since consecutive accesses may be for rows in the same bank. For example, if the DRAM controller has four requests to transfer data to different rows in the same bank, the controller cannot use bank interleaving. However, if the DRAM controller has four requests to transfer data to different rows that are all in different banks, then the controller can use bank interleaving to possibly eliminate the overhead between these four transfers. Assuming that the accesses to the DRAM have an average distribution across the available banks, then the average overhead will be reduced using bank interleaving.
Based on the above information, it can be seen that DRAM is more efficient for transferring large chunks of data. Whether the access is for a single data element or for many data elements, the overhead to start and stop the transfer is the same. During the data burst periods between the open and precharge times, the DRAM provides data at its maximum bandwidth. For example, if the DRAM provides 32-bits every 5 ns, then it maintains 800 MBytes/sec during this data burst period. However, when the overhead on the DRAM interface of opening and precharging the DRAM is included, then the actual DRAM bandwidth can drop significantly. For example, the average bandwidth can often be reduced by 20% to 50%, depending on how often the overhead is required between data bursts.
The problem for the controller is that it has N channels, for example twelve in an exemplary implementation, all of which are competing for access to the same DRAM. Some of these channels normally burst large amounts of data that can exceed the width of a DRAM row, some burst only a few data elements at a time, and some transfer only one data element at a time. For the bursting channels, some of them require a certain amount of bandwidth (number of bytes per second). Some of these channels have critical latency requirements where access to the DRAM must be obtained within a certain time or else the system breaks. Some of these channels require fast access to the DRAM in order to keep the average system performance high.
Finally, some channels do not have a latency or bandwidth requirement. The bandwidth of each individual channel is less than the available bandwidth of the DRAM, but the sum of all channel bandwidths may or may not be less than the bandwidth of the DRAM.
Given channels with the characteristics above, the controller must share access to the DRAM between these channels in an intelligent way. The three main characteristics to optimize within a DDR controller are as follows:                1. Guaranteed Maximum Latency—Latency is the time that each channel must wait between accesses to the DRAM. Some channels must transfer data within certain latency or the system fails. Latency is adversely affected by long periods of activity by other channels. This is a good reason to keep the burst of each channel small, and to use first-in-first-out buffers (FIFOs) within the channels to continue the data transfer while waiting for access to the DRAM. In order to determine the sizes of the FIFOs that are needed, it is important to guarantee that the latency seen by each channel will not exceed a certain maximum value. This value should be as low as possible under the worst-case scenario.        2. Guaranteed Minimum Bandwidth—The system has a minimum bandwidth requirement, which, when met, keeps the system data rates at their required levels. Some channels must maintain a certain bandwidth over time to keep up with their connected devices or the system fails. Other channels must maintain a certain bandwidth or the system performance is affected to a degree that is unacceptable. Bandwidth is adversely affected by many small accesses, which are inefficient because they have a high overhead-to-data ratio. This is a good reason to keep bursts large when possible. This guaranteed bandwidth should be as high as possible under the worst-case scenario for the critical channels.        3. Average Bandwidth—The overall system performance is a function of many aspects, one of which is the average bandwidth that is available for channel access to the DRAM. The first and second parameters above guarantee that the system never fails or reaches unacceptable levels of performance, where this parameter attempts to provide a statistical average bandwidth that is much higher than this. The ideal goal for the average bandwidth is to support all of the channels simultaneously running at their maximum rate. In order to allow the performance to be as high as possible, it is good to design this parameter to be as high as possible under typical to more worse case scenarios.        
The problem is that there is an inherent conflict between latency and bandwidth. It is possible to reduce the latency by reducing the burst lengths for each channel, but this could cause the bandwidth to be lower because the ratio of overhead-to-data increases. Similarly, if burst lengths are increased, the overall bandwidth will increase, but latency is going to increase as well. If the latency is too high, then it can cause FIFOs to overrun and/or parts of the system to be throttled. This will have a negative impact on the performance of the disk drive.
A solution is needed that can find the best compromise between these parameters to guarantee that under the worst case situation the system does not break and continues to perform at an acceptable level, and that under typical conditions the system performs at an excellent level (or as close as possible to the theoretical limit given by the external DRAM chosen). This solution should have the right static and dynamic configuration available to optimize the performance based on static and dynamic conditions in the system.
One previous solution was a strict priority arbitration scheme with a programmable bumping capability. This scheme allowed higher priority channels to bump lower priority channels to get access to the buffer if the higher priority channel allowed bumping. The bumping scheme was implemented by using “Can Bump” and “Can't Be Bumped” static configuration bits. Each channel had two bits, a “Can Bump” bit indicating whether it is allowed to bump another channel and a “Can't Be Bumped” bit indicating whether another channel can bump it. For instance, the critical single access channel mentioned above would have both its “Can Bump” and “Can't Be Bumped” bits set. This would allow it to bump other channels, but once active, it cannot be bumped by another channel. Another example would be a bursty low priority channel. It would have its “Can Bump” and “Can't Be Bumped” bits both cleared. This would allow any higher priority channel to come in and take its buffer access away. This would also prevent this channel from hurting bandwidth by bumping anyone else. Instead, it would wait until the more natural boundaries where the higher priority channels are done with their bursts and then start its burst.
In order to make this priority scheme as efficient as possible, channels had minimum thresholds that they had to meet before they were allowed to attempt to access the buffer. This would hold them off until they had enough data for a long burst and free up the buffer for access by other channels. The high priority channel was a bursting channel that had two thresholds to manage its request to the arbiter. When the amount of availability exceeded a first “critical” threshold, a burst was initiated. When the amount became less than a second threshold, then continuing requests were made only if no one else was requesting. This attempted to give a “critical” indication that helped the arbiter to make smarter decisions about optimizing bandwidth and minimizing latency.
One advantage of a priority-based scheme is that it has a good average overall bandwidth while meeting minimum latency goals for at least the highest priority channel in the system. The good bandwidth is achieved because the highest priority channels perform long efficient bursts and then leave long holes that are filled by the lower priorities. This method also does not waste any cycles since it immediately moves to the next channel when the current channel's request has deasserted.
A disadvantage of this solution is that it does not guarantee a maximum latency and minimum bandwidth to any channel except for the highest priority channel. This is because the highest priority channel can continue to burst to the DRAM for a long time that is not limited by a predictable boundary. This was acceptable in an older system because the older system only had one critical channel that needed these parameters to be guaranteed. New and upcoming systems may have more than one of these critical channels. In the strict priority based solution, the lowest priority channel can get locked out for a long time by the channels with higher priority.
Another disadvantage of this solution is that it does not adapt to changing conditions. If the bandwidth is not enough to cover all of the channels running at the worst-case rate, then it can be useful to change the priority or other properties of the channels dynamically. For example, if a given channel that normally is not critical has suddenly been detected to be critical, then it would be useful to change its parameters so it could bump another higher priority channel that is not as critical.
One more disadvantage of the old priority arbitration scheme is the unpredictability of sizing FIFOs. Because of the many high priority channels competing for access to the DRAM, it is difficult to predict how long any lower priority channel will have to wait for access. Therefore, choosing FIFO sizes for these lower priority channels becomes very difficult. Choosing a FIFO size that is too large is costly in terms of wasted die size. Choosing a FIFO size which is too small can make the system inefficient or may break the system altogether.
Other arbitration schemes that have been implemented in the industry include the following:    Time-Slot based round robin scheme:            In this scheme, every channel gets a guaranteed time-slot where a token is passed around to each channel. When the time-slot is used up, the channel is pre-empted and the token is given to the next channel in line. If the channel finishes early, it gives up the token.        The advantage of the strict Time-Slot based round robin scheme is that it does guarantee a fixed maximum latency per channel and a fixed minimum bandwidth per channel.        The disadvantage is that the time-slot size is usually fixed for each channel, so if a channel does not need all of its time-slot, then part of this time is wasted. So even though the latencies per channel are fixed, they can be longer than required based on the amount of data transferred. Even if the time-slot was made to be abortable when the request deasserted, another problem would be that some channels end their requests and then put out another request, but need both requests to be serviced within one time-slot. The standard time-slot scheme with an early abort would end the time-slot when the request first goes away. Another disadvantage is that the time-slot size is not adjusted based on dynamic conditions.            Non Time-Slot based round robin scheme:            In this scheme, every channel gets to keep the token until finished. Only when the channel is finished will the token be given to the next channel in line.        The advantage is that every channel gets a turn eventually. The scheme is perfectly fair, and since there is no bumping, the bursts are nice and long, which results in a good minimum overall bandwidth.        The disadvantage is that it does not guarantee a maximum latency to each channel since the burst lengths of the previous channels are not restricted. Also, it does not adjust to dynamic conditions.            First-come-First-Served scheme:            In this scheme, the arbiter keeps track of the order of the requesting channels. An example of this scheme (and its efficiency) is a telephone customer service system.        An advantage is that the minimum overall bandwidth is good because the burst lengths are not limited again. Also, the sooner access is requested, the sooner it will be granted, so channels are not locked out for indefinite periods as in the strict priority scheme. This scheme is also fair since it does not favor any channel.        However, the disadvantages are that critical channels are not serviced with higher priority and the scheme does not guarantee a maximum latency to each channel. Also, the scheme does not adjust to dynamic conditions.        
There is one example that combines known solutions. In this example, a priority arbiter is implemented with a fairness register. The fairness register has a bit per channel where a given channel's bit is set when the channel wins access, and all bits are cleared when the channel completes its transfer and no channel without a fairness bit set is requesting. This is used to make sure each channel is able to get a transfer in before a different channel performs two transfers. But it still allows higher priority channels to go first when starting from an idle condition where the fairness register is all cleared. The disadvantage of this solution is that it does not guarantee a maximum latency per channel or a minimum bandwidth per channel.
None of the above solutions provide the right configurability to provide guaranteed bandwidth/latencies to each channel while also providing an optimal average bandwidth/latency mix. Furthermore, the above solutions do not have the right configurability to adjust to new problems, which arise in the field, such as the need for a channel to be given a higher level of importance while the other channels remain with a secondary level of importance.