Transport Control Protocol Offload Engines (TOEs) and related hardware transport mechanisms are becoming more widely used, and are likely to ultimately displace conventional software transport stacks. Hardware transport can support very efficient and highly scalable network communications. More specifically, hardware transport avoids both the kernel intervention and intermediate buffer copying required by software transport. Kernel intervention and the associated processor context switching require significant computing resources and time. Thus, the avoidance of kernel intervention during transport makes for more efficient network communication. Intermediate buffer copying and its associated memory bandwidth consumption are also resource and time intensive, and eliminating them during transport results in additional efficiency.
The performance gain from kernel bypass and zero copy mechanisms offered by hardware transport is estimated at approximately four times (4×). This is because a receive operation with hardware transport only passes the data once over the system memory interface, i.e. 1×, transferring data directly from the wire to the application buffer. With a conventional intermediate copy through kernel buffers, the data is transferred three times over the system memory interface, i.e. 3× (write, read, write). The elimination of the additional kernel mode context switches, cache faults, and cache coherence traffic result in the additional performance gain.
Although far more efficient, hardware transport creates a problem for receive-side security scanning (e.g., malicious code scanning of received data). Accessing received data for security scanning before allowing the target application to read the data substantially diminishes or eliminates the 4× performance advantage of hardware transport over software transport. Because it is impossible to perform software-based security scans without at least reading the data, some performance penalty is unavoidable. However, it would be highly desirable to retain as much of the performance gain provided by hardware transport as possible, and still be able to scan received data.
A solution to a separate problem in the area of parallel processing is of interest. Parallel processing is a technique to gain a performance advantage by forking multiple processes to perform computational activities in parallel. A common related operation is the creation by a process of a child process that is a memory clone of the parent. In a naive implementation, this requires the complete memory space of the parent process to be copied to a new memory area for the child process. This is a very expensive operation that would negate much of the performance advantage of parallel processing. To minimize the amount of copying required, to delay the copying until absolutely necessary, and to stagger the copying operations in time (so they didn't monopolize the memory interface), a clever solution termed “copy-on-write” was devised.
Copy-on-write is a technique that allows the two processes to share the single memory space of the parent as much as possible by only copying data to a separate memory area for the child process when one of the processes is to write to that data. Because typically the majority of the data is never updated by either process, the two processes can share a single copy. Two copies are needed only for the data modified by the parent or child.
In the customary implementation, copy-on-write alters the virtual memory system page table entries so that either the parent or child process attempting to write a page will fault to a handler that first copies the page so that the child process has its own local copy. After this local copy is created, the page table entries are reset to permit write operations on the page. The copy-on-write mechanism is known to those of skill in the art of systems programming, and is documented, for example, in Operating System Concepts by Silberschatz, Galvin and Gagne (John Wiley & Sons, 2003) pp 328-29.
What is needed are computer implemented methods, computer readable media and computer systems for minimizing the security scanning performance penalty for hardware transport network interfaces by allowing as much data as possible to be copied directly into the target application buffer, while still allowing access to received data for scanning as desired.