This invention relates generally to computer networks and, more specifically, to the utilization of caches within intermediate nodes of a computer network.
A computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as computers. A local area network (LAN) is an example of such a subnetwork; a plurality of LANs may be further interconnected by an intermediate node, called a router, to extend the effective xe2x80x9csizexe2x80x9d of the computer network and increase the number of communicating nodes. The nodes typically communicate by exchanging discrete frames or packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Each node typically comprises a number of basic subsystems including a processor subsystem, a main memory subsystem and an input/output (I/O) subsystem. In particular, the main memory subsystem comprises storage locations typically composed of random access memory (RAM) devices which are addressable by the processor and I/O subsystems. In the case of a router, data such as non-transient data (i.e., instructions) and transient data (i.e., network data passing through the router) are generally stored in the addressable storage locations.
Data is transferred between the main memory and processor subsystems over a system bus that typically consists of control, address and data lines. The control lines carry control signals specifying the direction and type of transfer. For example, the processor issues a read request signal to transfer data over the bus from an addressed location in the main memory to the processor. The processor then processes the retrieved data in accordance with instructions obtained from the memory. The processor thereafter issues a write request signal to store the results in an addressed location in the main memory.
The data transferred between the processor and main memory subsystems must conform to certain timing relationships between the request signals and the data on the bus. Access time is defined as the time interval between the instant at which the main memory receives a request signal from the processor and the instant at which the data is available for use by the processor. If the processor operates at a fast rate and the access time of the main memory is slow as compared to the processor rate, the processor must enter a wait state until the request to memory is completed, thereby adversely affecting the processing rate of the processor. This problem is particularly significant when the memory request is a read request, since the processor is unable to operate, that is, process data, without the requested information.
A high-speed primary cache memory may be used to alleviate this situation. The primary cache is typically located on the processor and has an access speed that is closer to the operational speed of the processor; thus, use of the cache increases the speed of data processing by providing data to the processor at a rapid rate. The cache operates in accordance with the principle of locality; that is, if a memory location is addressed by the processor, it will probably be addressed again soon and nearby memory locations also will tend to be addressed soon. As a result, the cache is generally configured to temporarily store most-recently-used data. When the processor requires data, the cache is examined first. If the data is not located in the cache (a cache xe2x80x9cmissxe2x80x9d), the main memory is accessed. A block mode read request is then issued by the processor to transfer a block of data, including both the required data and data from nearby memory locations, from the main memory to the cache.
A primary cache is faster and more expensive to implement than main memory and, because of its higher cost, smaller. To supplement such an expensive primary cache, a secondary cache may be employed. The secondary cache does not operate as fast as the primary cache primarily because the secondary cache is typically coupled to a processor bus within the processor subsystem; operations occurring over the processor bus generally execute at a different (slower) clock speed than that of the primary cache internal to the processor. Yet, data accesses to secondary cache occur faster than those to main memory.
Typically, a random access main memory is logically organized as a matrix of storage locations, wherein the address of each location comprises a first set of bits identifying the row of the location and a second set of bits identifying the column. A cache memory, such as the primary or secondary cache, holds a number of blocks of data, with each block containing data from one or more contiguous main memory locations. Each block is identified by a cache address. The cache address includes memory address bits that identify the corresponding memory locations. These bits are collectively called the index field. In addition to data from main memory, each block also contains the remainder of the memory address bits identifying the specific location in main memory from which the data in the cache block was obtained. These latter bits are collectively called a tag field.
Each node, including the router, is functionally organized by an operating system comprising a collection of software modules that control the execution of computer programs and manage the transfer of data among its subsystems. The processor subsystem executes the programs by fetching and interpreting instructions and processing network data in accordance with the instructions. Program-generated addresses are called virtual addresses because they refer to the contiguous logical, i.e., virtual, address space referenced by the computer program. In contrast, the physical address space consists of the actual locations where data is stored in main memory. A computer with a virtual memory allows programs to address more memory than is physically available. The operating system manages the virtual memory so that the program operates as if it is loaded into contiguous physical locations. A common process for managing virtual memory is to divide the program and main memory into equal-sized blocks or pages so that each program page fits into a memory page. A system disk participates in the implementation of virtual memory by storing pages of the program not currently in memory. The loading of pages from the disk to host memory is managed by the operating system.
When a program references an address in virtual memory, the processor calculates the corresponding main memory physical address in order to access the data. The processor typically includes memory management hardware to hasten the translation of the virtual address to a physical address. Specifically, for each program there is a page table containing a list of mapping entries, i.e., page table entries (PTEs), which, in turn, contain the physical address of each page of the program. Each PTE also indicates whether the program page is in main memory. If not, the PTE specifies where to find a copy of the page on the disk. Because of its large size, the page table is generally stored in main memory; accordingly, an additional memory access is required to obtain the physical address, which increases the time to perform the address translation.
To reduce address translation time, another cache dedicated to address translations, called a translation-lookaside buffer (TLB), may be used. The TLB contains entries for storing translations of recently accessed virtual addresses. A TLB entry is similar to a cache entry in that the tag holds portions of the virtual address and the data portion holds a physical page-frame number. When used in conjunction with a cache, the TLB is accessed first with the program-generated virtual address before the resulting physical address is applied to the cache.
Accordingly when the processor requires data, the virtual address is passed to the TLB where it is translated into a physical address which is used to access the caches. Specifically, the index field and tag field in the primary cache are initially examined to determine whether a block contains the requested data. If the data is not present in the primary cache, the secondary cache is examined. The secondary cache is generally arranged such that when data is forced out of primary cache it is stored in the secondary cache. If that data is needed before a corresponding block is forced out of secondary cache, then the primary cache is filled from secondary cache instead of from main memory.
This arrangement works well for non-transient data that is likely to be referenced many times from primary cache over a non-contiguous period of time. However if the data is transient and unlikely to be referenced more than once, storage of the data in secondary cache provides no performance advantage. It is thus desirable to keep transient data out of secondary cache so that the cache can be used to hold data that is more persistent, such as processor instructions and frequently-accessed data structures.
Although larger than primary cache, the secondary cache is substantially smaller than main memory. If the working set of memory (i.e., the range of addresses accessed within a unit of time) is large, the primary/secondary cache system arrangement is inefficient because it constantly thrashes between different areas of main memory. To reduce thrashing, the working set of memory is kept to a minimum. Some thrashing may be acceptable for memory that is continually referenced, assuming that there is an opportunity to fill and re-fill primary cache from secondary cache prior to a corresponding cache block being forced out of secondary cache. If there is no chance of re-filling primary cache, thrashing negates the value of the secondary cache by producing a cluttering effect that prevents cache blocks which could be used to re-fill primary cache from being maintained in secondary cache.
In many cases transient data are not easily manageable. Such data may comprise scattered data structures that are not referenced often enough to take advantage of being re-filled from secondary cache. In addition to gaining no benefit from storage in secondary cache, transient data prevent other data structures from realizing any benefit due to cluttering of secondary cache. It would therefore be advantageous to identify those memory addresses holding transient data and to prevent those addresses from being stored in secondary cache.
The invention comprises a cache blocking mechanism for ensuring that transient data is not stored in a secondary cache of a router by managing a designated address range of buffers in a main memory of the router. In general, the mechanism defines a window of virtual addresses that map to predetermined physical memory addresses associated with a set of buffers; in the illustrative embodiment described herein, only transient data may be stored in these buffers. The mechanism further blocks write requests directed to these memory buffers from propagating to the secondary cache, thereby precluding storage of transient data in the cache. This, in turn, eliminates cluttering of that cache, thereby increasing the processing performance of the router.
Specifically, a translation-lookaside buffer (TLB) is configured to transform virtual addresses falling within the window into the predetermined physical addresses, each of which is characterized by assertion of a particular address bit. In the illustrative embodiment, the particular address bit is typically asserted by the TLB during a block-mode write transaction to the memory. Assertion of this address bit invokes write-blocking circuitry that effectively inhibits copying of the transient data stored at the corresponding physical address in memory to the secondary cache.
One advantage of the invention is that the elimination of transient data in the secondary cache, together with the principle of locality, increases the likelihood that relevant non-transient data may be present in the cache during access requests to memory, thereby providing a substantial performance enhancement over existing routers. In addition, the inventive mechanism is flexible in that the designation of the data stored in the buffer set may be easily converted from primary-only cached (i.e., secondary cache-blocked) to fully-cached to enable efficient management of the secondary cache.