1. The Field of the Invention
The present invention relates generally to methods for increasing the efficiency, speed and/or throughput of a computer system. More specifically, the invention relates to methods for offloading computing tasks that are typically performed by a host processor in software, to a specific hardware component, thereby freeing up host processor resources and increasing the overall efficiency of the computer system.
2. The State of the Art
A functional computer system generally consists of three fundamental components. The first component is the host computer and its associated peripheral hardware components. The host computer typically includes a central processing unit (CPU), which is interconnected via a bus with, for instance, system memory such as RAM or ROM. A system will also include a number of peripheral hardware devices, depending on the functionality needed, such as magnetic or optical disk storage devices, a keyboard or other input device, a display or other output device and communication equipment, such as a modem and/or a network interface card (NIC). Another fundamental computer component is the application software. Such software includes the familiar word processing applications, spread sheet applications, database applications, communications and network applications and so forth.
The final component of a modem, functional computer system is an operating system. The computer operating system performs many functions such as allowing a user to initiate execution of an application program. In addition, modern operating systems also provide an interface between application software and the host computer and its peripheral hardware. Thus, while it was once commonplace for an application program to directly access computer system hardware, modem operating systems provide standardized, consistent interfaces that allow user applications to interface with or access computer hardware peripherals in a standardized manner. To provide a consistent interface, operating system architectures are increasingly designed so that there may be several software layers between the actual hardware peripheral and the application program. For example, an application may make a call into the operating system. The operating system, in turn, may utilize the services provided by a hardware device driver layer. The device driver layer would then interface directly with the specific hardware peripheral. A primary advantage of such a layered approach is that layers may be added or replaced without impacting the other layers.
As will be appreciated, the complexity and sophistication of such operating systems, application software, and networking and communications continues to increase. This of course results in more functional and useful computer systems. However, this increased functionality is not without a cost. More feature rich operating systems and software applications often result in an increase in the processor overhead as a result of the additional duties that must be performed by a processor/CPU when executing such system functions and/or applications, this phenomenon is especially apparent in connection with particular types of applications, such as network communication-type software applications. With the high bandwidth media that are increasingly prevalent, network speeds often match or exceed the CPU processor speed and memory bandwidth of the host computer. As such, to efficiently communicate over such networks, the CPU utilization and memory bandwidth used of the network-connected host computer must be minimized.
In addition, network applications further burden the host processor due to the layered architecture used by most, such as the seven-layer OSI model, or the layered model, or the layered model used by the Windows NT operating system. As is well known, such a model is used to describe the flow of data between the physical connection to the network and the end-user application. The most basic functions, such as putting data bits onto the network cable, are performed at the bottom layers, while functions attending to the details of applications are at the top layers. Essentially, the purpose of each layer is to provide services to the next higher layer, shielding the higher layer from the details of how services are actually implemented. The layers are abstracted in such a way that each layer believes it is communicating with the same layer on the other computer that is being communicated with via the network.
As will be appreciated, the various functions that are performed on a data packet as it proceeds between layers can be software intensive, and thus can demand a substantial amount of CPU processor and memory resources. For instance, in the Windows NT networking model, certain functions that are performed on the packet at various layers are extremely CPU intensive, such as packet checksum calculation and verification; encryption and decryption of data; message digest calculation and TCP segmentation. As each of these functions are performed, the resulting demands on the CPU/memory can greatly effect the throughput and performance of the overall computer system.
Although software applications and operating system functions are placing greater demands on computer system resources at the same time the capability, efficiency, and throughput of many computer hardware peripheralsxe2x80x94such as network interface cards (NICs)xe2x80x94are also increasing. These computer system peripherals are often equipped with a dedicated processor and memory, and typically are capable of performing very sophisticated and complex computing tasksxe2x80x94tasks that are otherwise performed by the computer system processor in software. For instance, may NICs are capable of independently performing tasks otherwise performed by the CPU in software at an appropriate network layer, such as checksum calculation/verification; data encryption/decryption; message digest calculation; TCP segmentation; and others. As such, there is an advantage in offloading such CPU intensive tasks to a peripheral hardware device. This would reduce processor utilization and memory bandwidth usage in the host computer, and thereby increase the efficiency, speed and throughput of the overall system.
However, the processing capabilities of different peripheral devices vary widely. Thus, there needs to be an efficient method by which a computer system/operating system can identify the processing capabilities of such peripheral devices, and then assign and offload specific processing tasks to the device when needed. Also, it would be desirable if the tasks could be identified and assigned dynamically, depending on the then current needs of the processor. This would allow the computer system processor to take advantage of the capabilities of a hardware peripheral on an as-needed basis.
The foregoing problems in the prior state of the art have been successfully overcome by the present invention, which is directed to a system and method for offloading functions and tasks that were previously performed at a processor-software level, to an appropriate hardware peripheral connected to the computer system. The invention is particularly useful in connection with the offloading of tasks to network interface card (NIC) peripheral devices, which can often perform many of the tasks otherwise performed by the computer CPU in software.
In a preferred embodiment of the invention, a software implemented method and protocol is provided that allows, for instance, the operating system (OS) to xe2x80x9cqueryxe2x80x9d the device drivers (often referred to as xe2x80x9cMACxe2x80x9d drivers) of any hardware peripherals (such as a NIC) that are connected to the computer system. The various device drivers each respond by identifying their respective hardware peripheral""s processing capabilities, referred to herein as xe2x80x9ctask offload capabilities.xe2x80x9d In the preferred embodiment, once the task offload capabilities of each particular peripheral have been identified, the OS can then enable selected peripherals to perform certain tasks that could potentially be used by the OS. The OS can thereafter request that a peripheral perform the previously enabled task, or tasks, in a dynamic, as-needed basis, depending on the then current processing needs of the computer system.
While this general inventive concept would be applicable to other application or operating system environments, embodiments of the current invention are described herein as being implemented and utilized in connection with the layered networking model of Windows NT. Of course, the invention could be implemented in connection with essentially any similar type of architecture for managing and controlling network communications. Specifically, the invention provides the ability to offload tasks or functions that are typically performed on a network packet at, for instance, the various network layers, and which typically require dedicated CPU and memory resources. These offloaded tasks can instead be optionally performed by the hardware peripheral that provides the actual physical communications channel to the networkxe2x80x94the NIC. For instance, rather than perform certain of the CPU intensive operations on the data packet as it passes through the respective network layersxe2x80x94e.g. checksum calculation/verification, encryption/decryption, message digest calculation and TCP segmentationxe2x80x94those tasks can instead be offloaded and performed at the NIC hardware.
In a preferred embodiment of the present invention, in the Windows NT layered networking architecture, a transport protocol driver, or transport, is implemented with an appropriate program method so as to be capable of querying each of the device driver(s) associated with the corresponding NIC(s) connected to the computer. Each queried device driver is similarly implemented so as to be capable of responding by identifying its specific processing, or xe2x80x9ctask offloadxe2x80x9d capabilities. In a preferred embodiment, once the task offload capabilities of each individual peripheral device have been identified, the transport sets which of those specific capabilities are to be enabled. This essentially informs the peripheral device what type of tasks it should expect to perform during subsequent transmissions and/or receptions of data packets. Thereafter, the transport is able to take advantage of the enabled capabilities of a peripheral device on an as-needed basis. Preferably, the enabled functions are invoked via appropriate data that is appended to the actual data packet destined for the network channel. In this way, tasks can be offloaded dynamically, and more than one task can be offloaded at a time.
Thus, before a network packet is to be sent to a particular lower level device driver (e.g., residing at the MAC sublayer in a Windows NT environment), the transport will first determine what the capabilities of the corresponding NIC are. If capable of a particular function or functions, the transport enables the desired functions. If during subsequent packet transmissions the transport desires that a particular task be offloaded to hardware, it can dynamically append information to the packet that signifies that the desired function(s) should be performed on that packet at the NIC hardware. For instance, the transport will set a data flag in the data packet, thereby notifying the corresponding device driver that the NIC should calculate and append a checksum to that outgoing packet. The hardware/software on the corresponding NIC will then handle this particular packet processing on its own, without any intervention or assistance from the system CPU. The system processor is thus freed up to perform other processing tasks, and the overall efficiency and throughput of the system is improved.
As noted, in a preferred embodiment, tasks are downloaded dynamically. That is, the capability of the NIC can be selectively used on a per-packet basis, depending on the then current needs of the computer system. Moreover, since tasks that were previously performed at various levels of the network stack are now performed at a single pointxe2x80x94the NIC itselfxe2x80x94the approach is more tightly integrated and efficient, further improving the throughput of the entire system. Preferably, embodiments of the current invention provide the transport with the ability to xe2x80x9cbatchxe2x80x9d operations, i.e., offload multiple tasks to a single NIC. For instance, a single NIC can perform both checksumming and encryption on a packet, thereby eliminating multiple cpu cycles that would have otherwise been needed if the same functions were implemented at the respective network layers in software.
Accordingly, it is a primary object of this invention to provide a system and method for offloading computing tasks from a computer system processor to a hardware peripheral connected to the computer system. It is another object of this invention to provide a system and method for identifying the processing capabilities of individual peripherals. It is still another object of the present invention to provide a system and method for offloading tasks that can be effectively used in connection with a layered network architecture, whereby tasks that are typically performed at various layers in the network are instead offloaded to the appropriate network interface card (NIC). A related object of the invention is to provide a system and method in which computing tasks can be offloaded in a dynamic, as-needed basis, depending on the then current processing state of the computer processor. Yet another object of the present invention is to provide a system and method in which multiple tasks can be batched together, and then offloaded to a single peripheral device, such as a NIC.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other objects and features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.