The current Internet computing environment includes numerous instances of applications that transfer content between one or more incoming streams and one or more output streams, and that apply limited application-specific processing to the input content. In the area of Web serving, sample applications include HTTP Server and Proxy applications. HTTP Server applications, such as Apache, transfer dynamic content with non-apriori-known length received from Application Servers, such as IBM WebSphere, or from CGI scripts, to persistent client connections, and process the input stream only to remove the markup specific to the input stream, if any, and to add the markup for chunked encoding. Similarly, HTTP Proxy applications, like Squid, transfer content in chunked encoding received from HTTP Servers to non-persistent client connections, and process the content to remove the chunked encoding markup before sending it on the client connection and loading it into the cache.
In the area of multimedia serving, Media Server products from Microsoft or RealNetworks combine multiple media streams into a single stream, by adding encapsulation headers to each block of an input stream. For instance, the RFC 2327, “Session Description Protocol”, by M. Handley and V. Jacobson, published by IETF Network Working Group, April 1998, describes a protocol used for this type of media stream transmission.
In the area of distributed interactive applications with implementations employing application-level multicast, content is transferred from one or more input streams to one or more output streams, such as described in “A Case for End System Multicast”, by Y. Chu, S. Rao, S. Seshan, H. Zhang, published in ACM SIGCOMM, 2000. Depending on the routing method, per-packet processing is minimal, involving at most the rewriting of the application-specific header.
Numerous studies on TCP and server performance, including “End-System Optimizations for High-Speed TCP”, by J. Chase, A. Gallatin, K. Yocum, published in IEEE Communications, 39(4), April 2001, demonstrate that the achievable transfer bandwidths are limited by the overhead of copying data between kernel- and user-space buffers.
Known methods and apparatuses for efficient content transfers between incoming and outgoing streams demonstrate that by eliminating the data copy between kernel- and user-space buffers can produce significant performance benefits.
For instance, work described in “Exploiting In-Kernel Data Paths to Improve I/O Throughput and CPU Availability”, by K. Fall, J. Pasquale, published in USENIX Winter Conference, 1993, proposes in-kernel splicing mechanisms between data streams produced by devices/files and sockets. Namely, the mechanism allows an application to indicate one source and one destination file descriptor, and an amount of content for which a kernel-level service will perform the transfer, asynchronously. The proposal does not address data transfers between two TCP connections, which is a common scenario for Web servers and Media servers.
In-kernel splicing of TCP connections has been proposed, as well. For instance, in “A Comparison of Mechanisms for Improving TCP Performance over Wireless Links”, by H. Balakrishnan, V. Padmanabhan, S. Seshan, R. Katz, published in ACM SIGCOMM Conference, 1996, describes a mechanism that transfers content between two TCP connections, but the service is not accessible to applications to use for offloading their transfers.
Also known are several proposals for mechanisms that are accessible to applications for offloading their transfers. The proposals differ in the extent to which applications can control the length, directionality, and payload caching of the transfers. For instance, the paper “MSOCKS: An Architecture for Transport Layer Mobility”, by D. Maltz, P. Bhagwat, published in INFOCOM, 1998, enables unidirectional transfers, only in destination streams without prior activity, terminated by the close of the source stream. US Patent Application 20020078135 “Method and apparatus for improving the operation of an application layer proxy” extends the service model to permit transfer offloading for destination streams with prior activity. Finally, the paper “Kernel Support for Faster Web Proxies”, by M. C. Rosu, D. Rosu, published in USENIX Annual Technical Conference, 1993, further extends the service to permit bidirectional transfers, with specified content length, decoupled connection termination, and payload caching. Applications like Web servers and Media servers can use these mechanisms to offload into the kernel all of their data transfers that do not require content modifications. As a result they can achieve significant performance benefits. In experiments with Web Proxy Server workloads, kernel-level offloading can reduce up to 50% the CPU overheads.
However, in prior-art arrangements, applications cannot offload into the kernel the data transfers which require any degree of content transformation. Applications must handle these transfers by reading the content from input streams at user level, applying the transformation, and writing the content to output streams. In this process, applications perform a large number of system calls and data copy operations between application and kernel spaces, which incur a large CPU overhead. Due to the application-specific processing that has to be applied to each packet or group of packets, such applications cannot benefit from conventional mechanisms when it comes to offloading these transfers at kernel level.
Conventional arrangements, as such, include mechanisms for loading and executing application-specific procedures in kernel context. For instance, an infrastructure that “allows applications to specialize the underlying operating system in order to achieve a particular level of performance and function” is described in “Extensibility, Safety and Performance in the SPIN Operating System”, by B. Bershad, S. Savage, P. Pardyak, E. Sirer, D. Backer, M. Fiuczynski, C. Chambers, S. Eggers, published in the ACM Symposium on Operating System Principles, 1995. Similarly, modern operating systems, such as Linux, provide mechanisms for application-specific customization of various event handlers, including those related to data streams. However, prior art addressing specifically the area of data stream manipulation, is not addressing the selective customization of the processing along with the coupling of input and output streams.
In view of the foregoing, a need has been recognized in connection with providing an apparatus that allows applications to offload to kernel space both content transfers and simple content processing.