This invention relates generally to nodes of computer networks and, more specifically, to improving the efficiency of storing packets in the nodes"" computer memories by eliminating the need for additional read-modify-write (RMW) operations.
A computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as intermediate nodes and end nodes. A local area network (LAN) is an example of such a subnetwork; a plurality of LANs may be further interconnected by an intermediate network node, such as a router or switch, to extend the effective xe2x80x9csizexe2x80x9d of the computer network and increase the number of communicating nodes. Examples of the end nodes may include servers and personal computers. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Each node typically comprises a number of basic subsystems including a processor, a main memory and an input/output (I/O) subsystem. Data is transferred between the main memory (xe2x80x9csystem memoryxe2x80x9d) and processor subsystem over a memory bus, and between the processor and I/O subsystems over a system bus. Examples of the system bus may include the conventional lightning data transport (or hyper transport) bus and the conventional peripheral component [computer] interconnect (PCI) bus. The processor subsystem may comprise a single-chip processor and system controller device that incorporates a set of functions including a system memory controller, support for one or more system buses and direct memory access (DMA) engines. In general, the single-chip device is designed for general-purpose use and is not heavily optimized for networking applications.
In a typical networking application developed using the single-chip device, packets are received from a framer, such as an Ethernet media access control (MAC) controller, of the I/O subsystem attached to the system bus. A DMA engine in the MAC controller is provided a list of addresses (e.g., in the form of a descriptor ring in a system memory) for buffers it may access in the system memory. As each packet is received at the MAC controller, the DMA engine obtains ownership of (xe2x80x9cmastersxe2x80x9d) the system bus to access a next descriptor ring to obtain a next buffer address in the system memory at which it may, e.g., store (xe2x80x9cwritexe2x80x9d) data contained in the packet. The DMA engine may need to issue many write operations over the system bus to transfer all of the packet data.
For example, assume the system memory comprises double data rate (DDR) synchronous dynamic random access memory (SDRAM) devices and that a portion of the memory is organized into packet buffers at system initialization. These buffers are defined to start on certain binary boundaries (e.g., 32 bytes) in order to take advantage of the system bus burst size, bus alignment and cache line size. Assume also that the system (PCI) bus has a width of 32 (or 64) bits and system memory accessed over the bus is also 32 (or 64, respectively) bits wide thereby matching the bus. Moreover, each access (transfer) of data over the system bus to the memory comprises two cycles (or four half cycles). Therefore, at 32 (64) bits per half cycle, the minimum transfer size of packet data over the system bus is 16 (32) bytes (that is 4 bytes per half cycle times 4 half cycles equals 16 bytes).
Continuing the same example, in order to write 40 bytes of data to an address, e.g., 0x100038 in system memory, 10 half cycles are needed over a 32-bit (4 byte wide) system bus. However, to write 40 data bytes to system memory address 0x100037, 11 half cycles would be required (i.e., the first half cycle with 1 byte of data, 9 half cycles with 4 bytes and the last half cycle with 3 bytes). By aligning the buffer start address with the width of the system bus, efficient use of that bus is ensured. This same efficiency carries over to the actual system memory interface, where the data can be written into the system memory using the fewest cycles if the start of the buffer matches the granularity of accesses to the system memory. From a simplistic point of view with respect to the system memory, if a memory line is 32 bits or 4 bytes wide (herein a memory xe2x80x9clinexe2x80x9d includes the typical data storage word and any error correcting code extension to that word, if any), usually the entire line must be fetched in order to over write only one byte while preserving the other three bytes. So, in the above example [the system] , to store the 40 bytes, the system will write to 10 full width lines when starting at address 0x100038 because the memory line width and the message length match 10 complete, full memory lines exactly. However, if the starting address is 0x100037, the first byte of the 40 will be stored in the last byte of a 4 byte wide first line, and the last three bytes of the 40 being stored will be stored in the first three bytes of the 4 byte wide eleventh line. Now if the last three bytes were modified, the entire four bytes must be fetched and the first three bytes modified while keeping the last byte intact (assuming it is part of another message, etc.); that is, a read-modify-write (RMW) operation must be used for changing the last three bytes. A similar operation must be used to modify the first byte of the stored 40-byte message. In this case the inefficiencies of the non-aligned memory are seen in the need to access eleven, rather than ten bytes, and the need for read/modify/write operations, rather than simple write operations.
The present invention becomes even more important for large packet memories incorporating error correction codes (ECC). In these memories, it is not feasible to provide byte-write capability since the ECC covers the entire widths of the memories. For example, assume the system memory, including a system memory bus interface, is arranged to accommodate a 64-bit memory xe2x80x9clinexe2x80x9d width. Eight (8) additional bits are needed for ECC computation by the system memory controller such that the memory and memory bus interface are organized and aligned on 72-bit line widths. Therefore, a non-aligned start address not only could cause an extra write cycle but also the inefficient RMW operation discussed herein.
By starting the packet buffers on appropriate binary boundaries, the inefficiencies of writing packet data to the beginning of the buffers are avoided. However, there is no equivalent xe2x80x9cwork aroundxe2x80x9d in conventional systems when writing the end of the packet buffers in system memory. For example, if the effective memory width is 8 bytes and the length of a packet is 63 bytes, the last transfer of the packet over the system bus requires that only 7 bytes be written to the appropriate packet buffer. As noted, the processor and system controller device is general-purpose and, accordingly, does not xe2x80x9cknowxe2x80x9d that the portion of memory is reserved solely for packet buffers. Therefore, the processor and system controller device strictly interpret the system bus operation using a RMW operation to preserve the one byte location of the buffer that was not written with the packet data, rather than xe2x80x9cpadding outxe2x80x9d (e.g., writing null values) to that location. This represents an inefficient use of system resources and the present invention is directed to a technique that improves the efficiency of such resources
As noted, the RMW operation is quite expensive and consumes substantial over-head with respect to xe2x80x9cturning aroundxe2x80x9d the memory bus when writing data into an allocated buffer. That is, not only does the RMW operation double the traffic over the memory bus (by both reading and writing the data block), it also consumes overhead with respect to gaining access/ownership of the memory bus in order to avoid collisions over that bus. Therefore, not only is the operation expensive in terms of resource consumption, but it also adversely (and substantially) impacts throughput over the bus. Accordingly, the present invention is directed to improving the efficiency of memory write operations to buffers within a packet memory of an intermediate network node.
The present invention comprises a mechanism for instructing a memory controller with respect to the performance of a write operation directed to a system memory of an intermediate network node. The memory controller is preferably embodied within a single-chip processor and system controller device having bus interface logic coupled to a system bus of the node. The bus interface logic includes conventional base address registers configured to decode addresses from system bus requests initiated by a media access control (MAC) controller coupled to a system bus. The base address registers are then used to determine those resources (e.g., lines of the system memory) targeted by the requests, such as the write operation.
According to the present invention, the mechanism comprises a novel attribute, e.g., a bit, added to each base address register. Depending upon an application executing on the processor, the attribute bit may be configured to one of two states, each of which specifies a mode of operation. In a first state, the bit may be configured to indicate that a partial write operation to a memory line within the system memory should be enforced exactly as specified by the system bus request, thereby resulting in a read-modified-write (RMW) operation described before. Alternatively, a second state of the bit may be configured to indicate that the partial write data be xe2x80x9cpadded outxe2x80x9d to thereby overwrite, e.g., the entire memory line.
For example, assume the MAC controller issues a 7-byte write operation to the memory controller that is directed to an address 0x100030 in the system memory. If the bit is configured to specify overwriting of the memory line, the memory controller pads the data by one additional byte (at the end of the data), computes an error correction code for the entire 8 bytes and issues a single direct write operation to the 8-byte wide memory line. Similarly, assume the MAC controller issues a 6-byte external bus (write) request to the memory controller that is directed to address 0x100032. If the attribute is configured to specify overwriting of the memory line, the memory controller pads the data by 2 additional bytes starting at address 0x100030 (at the beginning of the data), computes an error correction code for the entire 8 bytes and issues a direct write operation to the 8-byte wide memory line. Although these operations destroy the previous contents of the byte at address 0x100037 (and of the bytes at addresses 0x100030-0x100031) within the packet buffer, the application did not intend to use those bytes anyway. The tradeoff is between using more memory than needed to store messages, as in the present invention, but providing the advantage of not requiring RMW operations.
In certain cases, it may desirable to allow a particular memory region (buffer) to be operated on in both modes. Here, the MAC controller may pad out a system bus request to fill a packet buffer with original packet data to thereby avoid a RMW operation. However, the application executing on the processor may require manipulation of a packet header and, thus, not want to destroy any packet data. To accomplish these objectives, the memory can be xe2x80x9cdual-mappedxe2x80x9d to a virtual address space using two sets of base address registers. Both sets of registers may reference the same memory address, but one set has the novel bit configured in the first state and the other has the novel bit configured in the second state. Alternatively, the novel bit can come directly from a high order bit of the address specified in the write operation. As a result, the present invention advantageously increases the efficiency of writing packet data to system memory, particularly for small packet sizes.