1. Field of the Invention
The present invention relates to the field of data communications networks. More particularly, the present invention relates to performance improvement of critical code execution using shared libraries and/or cache locking techniques.
2. Background
FIG. 1 is a block diagram illustrating a network connection between a user 10 and a particular web page 20. FIG. 1 is an example which may be consistent with any type of network known to those of ordinary skill in the art, including a Local Area Network (xe2x80x9cLANxe2x80x9d), a wide area network (xe2x80x9cWANxe2x80x9d), or a combination of networks, such as the Internet.
When a user 10 connects to a particular destination, such as a requested web page 20, the connection from the user 10 to the web page 20 is typically routed through several routers 12A-12D. Routers are internetworking devices. They are typically used to connect similar and heterogeneous network segments into Internetworks. For example, two LANs may be connected across a dial-up, integrated services digital network (xe2x80x9cISDNxe2x80x9d), or across a leased line via routers. Routers may also be found throughout the Internet. End users may connect to a local Internet service provider (xe2x80x9cISPxe2x80x9d) (not shown).
FIG. 2 is a block diagram of a sample router 12 suitable for implementing an embodiment of the present invention. The router 12 is shown to include a master control processing unit (xe2x80x9cCPUxe2x80x9d) 210, low and medium speed interfaces 220, and high speed interfaces 230. The CPU 210 may be responsible for performing such router tasks as routing table computations and network management. It may include one or more microprocessor integrated circuits selected from complex instruction set computer (xe2x80x9cCISCxe2x80x9d) integrated circuits (such as the Motorola, 68040 Microprocessor), reduced instruction set computer (xe2x80x9cRISCxe2x80x9d) integrated circuits (such as the RM4000 or RM7000 RISC processors available from Quantum Effect Design, Inc. of Santa Clara, Calif.), or other available processor integrated circuits. Non-volatile RAM and/or ROM may also form a part of CPU 210. Those of ordinary skill in the art, having the benefit of this disclosure, will recognize that there are many alternative ways in which memory can be coupled to the system.
The interfaces 220 and 230 are typically provided as interface cards. Generally, they control the transmission and reception of data packets over the network, and sometimes support other peripherals used with the router 12. Examples of interfaces that may be included in the low and medium speed interfaces 220 are a multiport communications interface 240, a serial communications interface 250, and a token ring interface 260. Examples of interfaces that may be included in the high speed interfaces 230 include a fiber distributed data interface (xe2x80x9cFDDIxe2x80x9d) 270 and a multiport Ethernet interface 280. Each of these interfaces (low/medium and high speed) may include (1) a plurality of ports appropriate for communication with the appropriate media, and (2) an independent processor such as the 2901 bit slice processor (available from Advanced Micro Devices Corporation of Santa Clara, Calif.) or the RM-7000 RISC processor (available from Quantum Effect Design, Inc. of Santa Clara, Calif.), and in some instances (3) volatile RAM. The independent processors control such communication intensive tasks as packet switching and filtering, and media control and management. By providing separate processors for the communication intensive tasks, this architecture permits the master CPU 210 to efficiently perform routing computations, network diagnostics, security functions, and other similar functions.
The low and medium speed interfaces are shown to be coupled to the master CPU 210 through a data, control, and address bus 290. High speed interfaces 230 are shown to be connected to the bus 290 through a fast data, control, and address bus 292 which is in turn connected to a bus controller 294. The bus controller functions are provided by a processor such as the 2901 bit slice processor or the RM-7000 RISC processor.
Although the system shown in FIG. 2 is an example of a router suitable for implementing an embodiment of the present invention, it is by no means the only router architecture on which the present invention can be implemented. For example, an architecture having a single processor that handles communications as well as routing computations, etc. would also be acceptable. Further, other types of interfaces and media could also be used with the router. Moreover, the present invention is not limited to router applications, but may be used in any performance-sensitive application where the execution speed of critical code must be maximized.
In the past, it has not been possible to affect the cache locality of critical code, such as data forwarding or packet switching code in routers. Minor changes made to the code could affect the memory footprint, and hence the caching of critical software forwarding routines. As described herein, newer processors with cache locking functionality offer the ability to lock certain performance-critical routines in cache memory. However, in order to take advantage of cache locking features, a method is needed to guarantee cache locality of critical code.
Thus, the present invention provides consistently faster performance for critical code across software changes and version releases by guaranteeing the cache locality of critical code and by utilizing the cache-locking features of a processor providing such functionality when available. Techniques according to embodiments of the present invention improve the probability that critical code will be cached, and thus offer a significant performance improvement over known techniques. These and other features and advantages of the present invention will be presented in more detail in the following specification of the invention and in the associated figures.
Portions of code containing critical code routines are identified and labeled, then compiled into Dynamic Link Libraries (xe2x80x9cDLLsxe2x80x9d) and linked such that the critical code routines are optimally loaded into a reserved address space in the DLL memory space. If supported, cache locking may be enabled for the reserved address space. The portions of source code containing portions of critical code for which execution performance is to be improved are labeled, and the source code is scanned prior to compilation to locate the labeled portions of critical code. A linker is configured to store all the labeled portions of critical code into an Executable and Linking Format (xe2x80x9cELFxe2x80x9d) section header, which is relocated at run-time into a memory space reserved for the portions of critical code. Alternatively, the critical code is compiled and linked into an executable file containing the critical code, and the executable file is optimized by scanning the instruction stream and in-lining the critical code. A prolog and an epilog that accommodates this in-lined critical code is generated, and a single optimized DLL containing the critical code is generated, which is then loaded into a reserved memory space. Robust fault containment is facilitated through the use of code modules implemented as shared libraries that can be loaded and unloaded in a running system by individual processes. These code modules can be replaced individually as defects are found and fixed without requiring replacement of the entire system image or application image. What would normally be a monolithic application is modularized, and the sharing of common code among multiple applications is facilitated.