1. Field of the Invention
The present invention relates to computer networking and specifically to pre-fetching and invalidating packet information held in a cache memory.
2. Background Information
A computer network is a geographically distributed collection of interconnected network links and segments for transporting data between nodes, such as computers. Many types of network segments are available, with the types ranging from local area networks (LANs) to wide area networks (WANs). End nodes, such as personal computers or workstations, typically communicate over the network by exchanging discrete frames or packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Computer networks may be further interconnected by an intermediate node, such as a switch or router, having a plurality of ports which may be coupled to the networks. For example, a switch may be utilized to provide a “switching” function for transferring information between a plurality of LANs at high speed. The switching function includes receiving a data packet at a source port that originated from a source node and transferring that packet to at least one destination port for forwarding to a destination node.
A router may be used to interconnect LANs executing different LAN standards and/or to provide higher-level functionality than a switch. If the LAN standards associated with the source and destination nodes are dissimilar (e.g., Ethernet and token ring), the router may also alter the format of the packet so that it may be received by the destination node. Routers typically operate at the network layer of a communications protocol stack used by the network, such as the internetwork layer of the Transmission Control Protocol/Internet Protocol (TCP/IP) communications architecture.
Routers often perform various functions associated with modifying data (packets) transferred along a path from a source to a destination. These functions may include modifying information contained in a packet's layer-3 header. For example, an IP packet contains a time-to-live (TTL) field in its IP header. The TTL field indicates the number of “hops” a packet may take along a path before it is discarded. As the packet traverses a path from a source to a destination, each router along the path typically decrements the TTL field by one, determines if the field contains a zero, and, if so, discards the packet.
A router may employ one or more processors (CPUs) to modify packets processed by the router. Typically, a packet resides in a low-speed memory external to the processor (e.g., an external packet buffer) and the processor executes various instructions to access the packet and modify it, accordingly. In systems where the processor is significantly faster than the external memory, a high-speed memory, such as a cache memory, may be employed to enable faster access to the packet data and increase the router's capacity to process packets.
In a typical arrangement, the cache memory resides between the processor and the external memory, and comprises high-speed memory devices and logic configured to process memory requests containing operations (e.g., read data, write data) and addresses issued by the processor. The cache memory typically processes a request by determining if the data associated with the request is in the cache memory, and if so, performing the requested operation on the data in the cache memory. If the data is not present in the cache memory, the cache may acquire the data from the lower-speed external memory before performing the requested operation.
Requests involving read operations are typically performed by acquiring the data, as described above, and presenting the acquired data to the processor. The operation is usually considered complete when the data is presented to the processor. Requests that involve write operations may be processed differently depending on the configuration of the cache memory. For example, in a “write-back” cache memory configuration, the cache memory may simply acquire the cache data, as described above, and perform the write operation on the data in the cache memory. The operation is considered complete when the cache data has been modified. On the other hand, in a “write-through” cache configuration, the cache memory acquires the data, performs the operation on the data in the cache memory and writes the modified data back to the external memory. The operation does not complete until the data is actually written back to the external memory.
In some systems, multiple devices may have access to the data contained in the external memory. These systems may include coherency logic configured to maintain coherency between the data in the cache memory and the external memory. By doing so, data written by a device, such as a processor, is “seen” by the other devices that have access to the data, and vice-versa. For example, in a write-through cache configuration, the coherency logic may be configured to determine if a device other than the processor has written data to a memory location that is held in the processor's cache, and if so, invalidate the data in the processor's cache. In a write-back cache configuration, the coherency logic may be configured to determine if data written by the processor and held in the cache is being accessed by another device, and if so, write (flush) the data back to the external memory before the other device's request is processed.
Although a cache memory may increase a router's capacity to process packets, it may also act as a bottleneck to further increasing the router's packet processing capacity. For example, if the router's processor is faster than the cache, the processor may stall waiting for the cache memory to service its request, thus, limiting the processor's capacity to process packets. The processor may execute special instructions to, inter alia, pre-fetch data into the cache memory by acquiring the data from the external memory and placing the data into the cache memory before the processor uses the data; however, doing so often impacts the processor's performance. It should be noted that as used herein, “pre-fetching” relates to the technique of acquiring data from e.g., an external memory, before the processor needs the data.
Other problems associated with a cache memory may arise with having to maintain data coherency. For example, a router may employ data structures, such as transmit and receive descriptor rings, which enable the router's processor to send and receive packets to and from the network via a network interface. The descriptor rings often comprise one or more entries wherein each entry contains a pointer that references a memory location where a packet resides and an ownership bit that indicates ownership of the entry, i.e., whether the entry is “owned” by (exclusively accessible to) the network interface or the processor.
In a typical arrangement, when the network interface acquires a packet from the network it places the packet in an external memory at a location specified by a receive descriptor ring entry owned by the network interface. The network interface then changes the ownership of the entry to indicate the processor owns the entry, and notifies the processor that the packet has been placed in the external memory. The processor processes the packet which may include, e.g., retrieving data associated with the packet from the external memory, placing the retrieved data into its cache, modifying the packet data in the cache, placing the modified packet data from the cache back into the external memory, placing the location of the packet in a transmit descriptor ring entry owned by the processor, and setting the ownership of the entry to indicate the network interface owns the entry. The processor then typically invalidates the cache locations associated with the packet.
The processor may execute special pre-fetch instructions to retrieve the packet data from the external memory and place the data in the cache. However, this technique may be slow and time consuming and act to limit the processor's capacity to process packets and hence, the intermediate node's overall packet processing performance. Additionally, the processor may invalidate cache locations associated with the packets by executing special instructions that direct the cache to invalidate various locations within the cache. Likewise, execution of these special instructions can be taxing on the processor and may further act to limit the processor's capacity to process packets, and hence, the intermediate node's overall packet processing performance.