1. Field of the Invention
The present invention relates to the field of computer memories. More particularly, the present invention relates to the field of First-In-First-Out memories for multi-processor computer systems.
2. Description of the Related Art
First-In-First-Out memories (FIFOs)are commonly used in electronic systems. FIFOs are used to transfer data between two electronic devices or modules where the data source (data writer) produces data asynchronously to a data sink""s (data reader""s) need for the data. The data is written into the FIFO by the writer as it is produced, and then read from the FIFO by the reader as the data is needed.
A FIFO is typically composed of a buffer where the data elements are stored, and descriptors which contain control information about the FIFO such as pointers to the storage locations where the next read or write operations should occur. An important consideration in the design of a FIFO is to prevent FIFO overflow and underflow conditions. A FIFO overflow occurs when the FIFO is full (all the buffer locations are used) and the writer attempts to insert a new data element. A FIFO underflow occurs when the FIFO is empty (there are no elements in the buffer) and the reader attempts to retrieve an element from the buffer. FIFO buffers may be implemented in many ways including arrays, linked lists and dedicated hardware memories. Similarly, FIFO descriptors may be software data structures or dedicated hardware circuits.
In many systems with multiple processors communicating among each other, each processor may have part or all of its memory accessible by other processors. Such memory will be referred to as shared memory, while processor memory that is not accessible for other processors will be called private memory. A subsystem, which includes a processor together with its private and shared memory, is typically interconnected with other subsystems by a bus. An example of such a bus is the industry standard Peripheral Component Interconnect (PCI) bus.
FIG. 1 shows a typical system having two electronic subsystem modules, each module including a processor and associated memory. A first module 10 which reads from a FIFO, and a second module 20 which writes to the FIFO, are connected via bus 30. Reader module 10 includes a processor 12, bus interface logic 14, private memory 16, and shared memory 18. Similarly, Writer module 20 includes a processor 22, bus interface logic 24, private memory 26, and shared memory 28. For each module, its shared memory is accessible by the other module. In such a system, only one subsystem may be the master of the bus at a given time. The master requests the bus, waits until the bus is granted, and then initiates a transaction resulting in data being transferred to one or more other subsystems. If another subsystem has data to transfer at the same time, that second subsystem has to wait until the current bus master terminates the transaction. The time elapsed between the bus request and the first element of data being transferred is called bus latency. The bus latency typically increases with the number of subsystems using the bus and the average length of a transaction.
The simplest implementation for a FIFO is to use dedicated hardware circuits. In this case, a write controller performs a write transaction to a fixed memory location. The hardware based FIFO manager stores the data in the order it was received and informs the writer when an overflow condition occurs via a discrete output from the FIFO. A read controller performs a read transaction from the FIFO mapped at a fixed memory location and the FIFO manager makes the data available at the FIFO output port from its internal storage, in the order it was put there by the writer. Examples of such FIFOs include fall-through FIFOs and RAM based FIFOs.
FIG. 2 illustrates a second way to implement a FIFO. In this implementation, both the FIFO buffer and the FIFO descriptor are implemented in shared memory. This implementation will be described further below.
In the discussion that follows the algorithms used to implement the FIFO will be described using the C programming language for illustration purposes. The actual implementation may employ a different language, assembly language, or a hardware FIFO manager. Similarly, the FIFO buffer will be described as a RAM-based array holding data elements, although other implementations are possible. In the example of FIG. 2, the FIFO is implemented in memory that is on-board to the writer electronic module. Alternatively, the FIFO could be implemented within shared memory on-board to the reader module, with appropriate changes to the read and write algorithms.
In a first conventional implementation A, the FIFO descriptor is composed of two pointers (a read pointer, RD, and a write pointer, WR) and two binary flags (a full flag and an empty flag). The corresponding C programming language type definition statement is:
In this implementation, an operation of retrieving an element from a FIFO may be implementing by the following GetFifo routine:
The read and write operations across the bus are indicated by the comments to the code.
The above routine is called to retrieve an element from the FIFO. If the FIFO is empty, the routine returns failure code 0. Otherwise, the data element is copied to the destination and the FIFO RD pointer is advanced. If the FIFO RD pointer comes to equal the FIFO WR pointer, the FIFO is empty. This FIFO read routine requires four read operations and two write operations across the bus.
To write an element into the FIFO, the routine PutFifo is called:
The above routine first checks FIFO status and returns failure code 0 if the FIFO is full. Otherwise, the data element is copied to the FIFO buffer and the WR pointer is incremented. If the WR pointer equals the RD pointer, the FIFO has become full. The PutFifo routine does not require any bus transactions because the FIFO is on-board to the Writer module in this example.
FIGS. 3A through 3G illustrate the writing of data to the FIFO and the reading of data from the FIFO, and the corresponding conditions of the Full and Empty flags. FIGS. 3E and 3F illustrate how the WR pointer, for example, wraps around from the end of the FIFO back to the beginning.
In a second implementation B of a shared memory FIFO, full and empty flags are not used. For this alternate implementation the type definition statement is:
In this implementation, an operation of retrieving an element from a FIFO may be implementing by the following GetFifo routine:
The above routine is called to retrieve an element from the FIFO. If the FIFO is empty (RD pointer equals WR pointer), the routine returns failure code 0. Otherwise, the data element is copied to the destination and the FIFO RD pointer is advanced. The routine uses 3 Read and 1 Write transaction across the bus.
Similarly, to insert an element into a FIFO, the following PutFifo routine is called:
The above routine first checks FIFO status and returns failure code 0 if the FIFO is full (an incremented WR pointer would equal the RD pointer). Otherwise, the data element is copied to the FIFO buffer and the WR pointer is incremented.
FIGS. 4A through 4F illustrate the use of the RD and WR pointers in this implementation. Note that with this implementation there is always one element of the FIFO that remains unused.
Each of the above two implementations is safe from race conditions due to asynchronous and overlapping execution of the GetFifo and PutFifo routines by different processors. They are also hazard free because only the Reader can modify the RD pointer and only the Writer can modify the WR pointer.
In FIG. 2 the FIFO buffer and descriptor are located in shared memory of the Writer. This is not optimal because write operations over a bus are typically much less expensive (time-consuming) than read operations both from a bus utilization perspective and in considering latency as experienced by the master device. The reason for this is that write operations are non-blocking while read operations are typically blocking. When a data element is written to a target device over a bus such as PCI, then the initiator device typically writes the data into the PCI master FIFO and then continues with other tasks. From that perspective, a write operations is a xe2x80x9cshoot-and-forgetxe2x80x9d type of operation. When the PCI controller acquires the bus, it has data available in the internal FIFO and may stream it over the bus immediately.
In contrast to write operations, read operations are typically blocking. When an initiator has to read a data element from a target device over the PCI bus, it has to request the bus, place the target address onto the bus, and wait until the data is retrieved by the target device and placed on the bus. This may cause a significant delay due to bus latencies. When the bus is acquired and the target is selected by address decoding logic, the PCI controller has to fetch data from the target location and place it on the bus. For example, a DRAM based memory may require as much as 120 ns (four cycles) to place the data on the bus, assuming the memory is not being accessed by a different device. These four cycles are then lost as far as bus utilization is concerned.
If read latency is sufficiently high then advantages may be realized by the target device immediately disconnecting after decoding its own address. The master device is required to retry the same operation. In the meantime the target has time to fetch data and place it in its slave read FIFO. In between the bus arbiter may decide to grant the bus to another target, effectively increasing the time for the target to fetch data without adversely affecting bus utilization. However, even this method increases bus utilization since the bus must be arbitrated twice for the same data. This method also does not resolve the master blocking issue. To the contrary, it may actually make things worse due to bus rearbitration.
There are several disadvantages of the implementations described above.
A hardware managed FIFO requires dedicated hardware resources which increases system costs. The size of the FIFO buffer is fixed and usually cannot be changed. The memory used to implement FIFO buffer is typically more expensive than bulk memory such as DRAM used to implement shared memories. Status information such as underflow and overflow has to be signaled by dedicated circuits which will often be incompatible with standard buses of defined width. If dedicated circuits for status signaling cannot be used, the hardware managed FIFOs must signal their status through the bus, which results in performance bottlenecks.
The shared memory implementations discussed above depend heavily on bus transfers to implement the FIFO access routines. This is a significant problem when the bus has a high latency, such as is common for a PCI bus. The cost (wait time) of accessing a shared memory location located in a different subsystem module is typically much higher than the cost of accessing a memory that is located on-board the module that is accessing the data.
Because the above-described FIFO access routines access the FIFO multiple times over the bus for a single FIFO access operation, and because many buses experience a high latency, accessing the FIFO results in a significant system performance bottleneck. FIFO accesses also can consume a high percentage of the available bus duty cycle.
Accordingly, it is an object of the present invention to provide a FIFO structure and a method of accessing a FIFO which results in much faster FIFO access times, and eliminates the performance bottlenecks typically associated with FIFO accesses across a bus.
In order to reduce the utilization of the bus, according to a first embodiment of the present invention the FIFO is provided with two separate descriptors, one per processor. Each FIFO descriptor includes a RD pointer (pointing to the next location within the FIFO to be read from), and a WR pointer (indicating the next location within the FIFO to be written into). In order to determine whether the FIFO is available for a FIFO operation, each processor need only check its own on-board descriptor. If the on-board descriptor indicates that the FIFO is available, then the first processor performs a FIFO operation. After the FIFO operation is complete the first processor updates both its own descriptor and the second processor""s descriptor to inform that processor that the FIFO is available to the second processor for a FIFO operation. Because a processor only has to access its own on-board descriptor to determine whether the FIFO is available, and because the FIFO performs an operation across the bus only after a FIFO operation, the number of accesses required to write a datum into the FIFO and retrieve the datum is reduced. In a system with a high latency bus, this reduction in bus transactions significantly increases FIFO throughput.
A second embodiment eliminates the WR pointer on-board to the FIFO reader module by predefining a special value to be non-valid data. When the Reader processor finds this non-valid data value in the FIFO, the processor knows that the FIFO is empty. Since the Reader knows when the FIFO is empty merely by examining the FIFO contents, the Reader does not need its own WR pointer to compare against its RD pointer. This eliminates the bus transactions that would otherwise be required for the Writer module to update the Reader""s WR pointer.
A third embodiment achieves still further reductions in bus accesses by passing credits from the first processor to the second. The credits represent the number of consecutive FIFO accesses that the second processor has permission to perform consecutively. For example, if the FIFO is full and the Reader reads five locations from the FIFO, there are now five locations newly freed up within the FIFO into which the Writer may write data. Accordingly, the Reader module passes five credits to the Writer module in a single bus transaction. The Writer module then has permission to write five consecutive elements into the FIFO. The passing of five credits with a single bus transaction reduces by four fifths the number of bus accesses required for the Reader module to update the Writer""s RD pointer. The consecutive FIFO accessing over the bus also allows page mode accessing to be used, providing that the particular bus and FIFO memory used support page mode accessing. Where supported, page mode accessing allows multiple data to be written in much less time than would be required to write each datum individually.
By reducing the number of bus transactions required per FIFO data element, and utilizing the faster transaction time for write operations than for read operations, the present invention achieves a throughput increase of five to ten times that possible with prior FIFO structures and access methods.
The above-described objects of the present invention and other features and benefits of the present invention will become clear to those skilled in the art when read in conjunction with the following detailed description of a preferred illustrative embodiment and viewed in conjunction with the attached drawings in which like reference designators refer to like parts, and appended claims.