1. Field of the Invention
The present invention relates to computer systems, and deals more particularly with methods, systems, and computer program products for improving the efficiency of data transfer within interconnected components of a virtual network, and in particular components of a single physical computing device.
2. Description of the Related Art
Use of distributed computing environments such as e-commerce computing (which may alternatively be referred to as e-business computing) has skyrocketed in recent years, due in large part to the popularity of the public network known as the Internet and the subset thereof known as the World Wide Web, or simply “Web”. Other distributed computing environments include intranets and extranets, where intranets are typically designed as computing networks for internal use by a business and extranets are typically designed for use by a business' suppliers and/or customers. Large-scale distributed computing networks often have very critical operational constraints which must be met, in spite of very high demands on the computing resources in the network, in order to maintain customer satisfaction. Examples of these operational constraints include highly available systems, secure access and secure transactions, and very fast turnaround time for responding to incoming messages.
Providing computing hardware and software to meet these requirements is an on-going challenge. One prior art approach to optimizing response time for messages is directed toward minimizing the input/output (“I/O”) overhead for a host computer or server (referred to hereinafter as a “host” or “host computer”) that is sending outbound data packets. If the host computer has a large volume of packets to send over a given interface, the I/O overhead can be minimized by delaying the sending of the packets until a certain threshold number of packets has been accumulated. Typically, these packets are stored in a contiguous packing buffer until the threshold is reached, after which all the buffered packets may be transmitted to the interface at once using a single I/O operation. Another way to minimize the I/O overhead is to copy multiple relatively small packets into a packing buffer, until reaching some threshold buffer size, and then to transmit the entire buffer in a single I/O operation.
When the host computer transmits packets on a local area network (“LAN”), it typically sends the buffered groups of packets to a protocol-specific interface, such as a Token Ring interface, an Ethernet interface, an FDDI (Fiber-Distributed Data Interface) interface, etc., where this interface then transfers the packets to the corresponding hardware adapter for actual transmission onto the physical LAN medium. The host may perform Address Resolution Protocol (“ARP”) processing before sending the buffered packets to the interface, where the ARP processing locates the Media Access Control (“MAC”) address associated with the next-hop Internet Protocol (“IP”) address—assuming that IP addressing is used within the system—from each packet, and puts this MAC address into the outbound packet header for use as the packet is routed through the network to its destination. Because the packets in the buffer may be intended for multiple destinations on the LAN, when they have been packed into a buffer by the sending host before transmission to the adapter, the adapter must locate each IP packet (using the packet headers in the buffer) and separately put each packet onto the LAN so it will arrive at the correct destination.
Alternatively, a technique which is commonly known as “ARP offload” may be used, where the ARP processing is done by the adapter rather than by the sending host. In this case, the host provides the next-hop IP address in each packet header, and part of the processing performed by the adapter for each packet includes using this next-hop IP address to locate the proper MAC address, and putting the MAC address into the packet header before putting the packets onto the physical LAN medium.
However, systems have been developed in recent years in which the packet transmission process is optimized by memory-to-memory exchange rather than transmitting the packets onto an actual communications network. An example of such a system is the IBM® eServer zSeries 900, or z900, a mainframe computer designed specifically for the needs of e-business computing environments. The z900 allows thousands of virtual servers or hosts to operate within one physical device, enabling it to meet the large-scale computing needs of customers such as application service providers, Internet service providers, and technology hosting companies. The z900 uses an enhanced I/O subsystem for dealing with its large number of processors, thereby providing efficient host-to-host connectivity. A “HiperSockets™” feature of the z900 allows for interchanging data between the multiple operating system images within a z900 server (such as from one Linux™ image to another, or between a Linux image and a z/OS image, where “z/OS” is a new 64-bit operating system developed by IBM), without requiring any physical cables or an external network connection. Instead, the HiperSockets feature enables TCP/IP (“Transmission Control Protocol/Internet Protocol”) messages to be exchanged using memory-to-memory transfers for packet transmission, effectively putting a virtual network or virtual LAN within the z900 system. Because no external network transmission is required for these exchanges, significant performance improvements can be realized. (“IBM” is a registered trademark, and “HiperSockets” is a trademark, of the International Business Machines Corporation. “Linux” is a trademark of Linus Torvalds.)
The HiperSockets technology is described in commonly-assigned U.S. Pat. No. 6,854,021 entitled “Communications Between Partitions Within a Logically Partitioned Computer”, which is hereby incorporated herein by reference and is referred to herein as “the related invention”. The term “logical partition” refers to an area of memory or storage allocated for use by a single instance of the operating system, and is commonly known as an “LPAR”. An example computing system using LPARs is illustrated in FIG. 1, which may be a z900 computer. The shared physical memory 110 in this example is divided into a number of logical partitions 112a-112n, each partition having discrete servers 114a-114n, respectively, labeled in FIG. 1 as discrete server 1 to discrete server n. Each discrete server preferably has a TCP/IP layer 116a-116n, respectively, for handling the transmission protocols for transmitting data in I/O operations for networks. Under each TCP/IP layer 116a-116n is a device driver 118a-118n, respectively, for driving data transmissions between the discrete servers. As disclosed in the related invention, the device drivers 118 drive data exchanges (shown generally by send arrows 122a-122n and receive arrows 120a-120n) between the LPARs, rather than driving actual I/O devices. A common lookup table 124 in the hardware systems area (“HSA”) 125 of memory 110 defines the discrete servers, as disclosed in the related invention.
When sending data to a HiperSockets device driver for transmission to another server (i.e. another host) located on the virtual LAN, existing systems send all data to this single interface, in a similar manner to that which has been described above for prior art LAN interfaces, even though the packets may be addressed to multiple destinations on the virtual LAN. Each packet has a packet header, and data in this header indicates the destination on the virtual LAN. HiperSockets therefore essentially provides a “virtual ARP offload” function, in that it takes a next-hop IP address from each packet header and uses that address to locate the appropriate destination on the virtual LAN to which to deliver the packet.
If the sending host packs multiple packets into a single packing buffer, as in the prior art LAN approach described above, then the HiperSockets driver on the virtual LAN must parse through each packet and evaluate contents of its packet header to determine the correct destination for delivering the packet. While this approach is satisfactory from a functional perspective, it is inefficient because there is actually no need for the adapter to build any MAC headers for the outbound packets which are to be transmitted on the virtual LAN: no network devices will be routing these packets among different machines, and thus there is no MAC address to be used.
Accordingly, what is needed is a technique whereby data transfer within a virtual communications network can be improved.