1. Field of the Invention
The present invention relates to computer networks, client/server based computing, and data storage (or storage network). More particularly, this invention relates to network management, performance enhancement and reliability improvement for network data access through servers.
2. Prior Art
The following definitions will be useful in discussing the prior art in this field, and how the present invention overcomes the limitations of the prior art:
xe2x80x9cServerxe2x80x9d: a computer system that controls data access and data flow to serve the requests from a user computer (client) connected through network(s).
xe2x80x9cServer-orientedxe2x80x9d: Refers to data that requires significant computation or processing, that usually is carried out by a server CPU. The examples are network user login processes going through authorization, authentication and accounting (MA).
xe2x80x9cStorage-orientedxe2x80x9d: Simple storage access such as disk read and/or write is considered storage-oriented. Most operations are data fetching and transport without the involvement of CPU. JPEG and MPEG file transport are examples of storage-oriented data.
In the current server-based Internet infrastructure, for an end user to access data from a remote web site, the following sequence of events will occur: First, the request packets from the user computer have to travel to a remote network access point via a wide area network (WAN), through the network gateway at the remote web system, and then to a server in the web system. Second, the server processes the request and sends a command to a storage device to fetch the requested data, the data travels from the device back to the server, and traverses a path back to the user computer. In this end-to-end set-up, a server is situated between the data sources and the user and is often the limiting element of the entire data access operation. Such a configuration may cause server(s) to become a major bottleneck between the clients (or network end users) and their requested data on storage devices. Both data and control traffic must pass through the server(s) twice.
Most current network systems are constructed with this architecture. Although a server system here can be a server clustering or load-balanced server farm, the fundamental problems in the content delivery through server(s) do not change. The main advantages of current systems are their flexibility and security. The server system has control over all the traffic flows. However, this architecture also comes with a number of disadvantages: server bus contention (in many cases, a PCI bus), server OS inefficiency in high-speed context switching (e.g., costly interrupt handling), and multiple data copying.
Server system bus contention causes two problems for networks. Since each peripheral component must contend for the bus usage without any guarantee of bandwidth latency and time of usage, the user data throughput varies, and the latency for data transfer cannot be bounded.
The server OS inefficiency puts a heavy toll on the network throughput. In particular, an interrupt causes two context switching operations on a server. Context switching is an OS process in which the operating system suspends its current activity, saves the information required to resume the activity later and shifts to execute a new process. Once the new process is completed or suspended, a second context switching occurs during which the OS recovers its previous state and resumes processing. Each context switch represents an undesirable loss of effective CPU utilization for the task and network throughput. For example, a server handles thousands of requests and data switches at high speed. Further, heavy loading and extensive context-switching can cause a server to crash. A small loss of data can cause TCP to retransmit, and retransmissions will cause more interrupts which in turn may cause more OS crashes. The OS interrupt- induced stability problem is very acute in a web hosting system where millions of hits can be received within a short period of time.
Multiple data copying is a problem (also known as xe2x80x9cdouble copyxe2x80x9d) for normal server operations. According to the current architecture, data received from the storage (or network) have to be copied to the host memory before they are forwarded to the network (or storage). Depending on the design of the storage/network interface and the OS, data could be copied more than two times between their reception and departure at the server, despite the fact that the server CPU does not perform many meaningful functions other than verifying data integrity. Multiple data-copying problem represents a very wasteful usage of the CPU resources. When this is coupled with the OS inefficiency, it also represents a significant degradation of QoS (Quality of Service) for the data transfer.
The current solutions to server bottlenecks have involved two different approaches: improving the network performance and improving the storage performance.
An NAS is a specialized server for storage file services. The specialized server is connected to a network. The major disadvantages are the lack of the flexibility that general servers have and its need to communicate with other servers. An NAS can be used in secured environments like an internal LAN or SAN. Authorization, account, and authentication (AAA) and firewall are unlikely to be performed by an NAS, since an overly complicated function is not easily implemented in such a system. Furthermore, it is not easy to upgrade software or protocols under the specialized design of NAS.
An NAS is a storage device with an added thin network layer so the storage can be connected to a network directly. It bypasses servers, so server bottlenecks may be non-existent for NAS systems. (We do not consider a storage-dedicated server as NAS.) The major disadvantages are the lack of the flexibility that servers have, (and the overhead associated with the network layer(s) (if it is too thick)). An NAS can be used in secured environments like an internal LAN or SAN. Authorization, account, and authentication (AAA) and firewall are unlikely to be performed by an NAS, since an overly complicated function may not be implemented due to the cost. Furthermore, it is not easy to upgrade software or protocols under the limited design of interfaces for NAS.
SAN is an architecture for storage systems with the advantages of flexibility and scalability. While NAS is limited due to its network interface, SAN defines an environment dedicated to storage without worrying about security or other heterogeneous design concerns. Servers (which are more versatile) are still needed to connect the SAN to an oustide network. Therefore, the server bottleneck is still present. Furthermore, access control and other server functions are not specified in SAN systems, so other components must be added for full functionality.
From the network approach, two techniques have been devised: Web Switching and Intelligent Network Interface. Among the goals of web switching is load balancing servers in a web hosting system. While web switching has many platforms, the basic approach is to capture the IP packets and use the information they contain in the layers 4 through 7 to switch the traffic to the most suitable servers, thus keeping the servers with balanced load. This approach does not address the problems of multiple data copying and server system bus contention. The server OS inefficiency problem is only indirectly addressed.
In the Intelligent Network Interface approach, functionalities are added to the NIC (Network Interface Card) that reduce server interrupts by batch processing. This approach does not address the Server system bus contention problem directly, and as a result, the latency of data transfer is still unbounded and data transfer throughput is still not guaranteed. In addition, this approach only reduces switching overhead but does not address the multiple data-copying problem.
Objects of the invention include the following:
1. To increase the network and storage access performance and throughput.
2. To reduce traffic delay and loss between network(s) and storage due to server congestion or to bound the latency for real-time streamings (QoS improvement).
3. To increase server and network system, availability, reliability and reduce server system failures by reducing the traffic going through the server bus, OS and CPU.
4. To maintain the flexibility of a server-based system (vs. a network attached storage or NAS).
5. To be scalable and reduce the total system-cost.
In sum, the invention aims to provide highest levels of server-based Reliability, Availability and Scalability (RAS) for a network system and highest levels of QoS for the end users.
These and other objects of the invention are achieved in the following solution strategies:
1. Throughput improvement by the data-driven multi-processor pipelined model.
2. File system consistency between the bypass board and the host.
3. HTTP synchronization between the bypass board and the host.
4. Caching on the bypass board.
5. Storage-based TCP retransmission on the bypass board.
In a networked system, an apparatus is introduced that causes the majority of data to bypass the server(s). This design improves the end-to-end performance of network access by achieving higher throughput between the network and storage system, improving reliability of the system, yet retaining the security, flexibility, and services that a server-based system provides. The apparatus that provides this improvement logically consists of a network interface, server computer interface, and storage interface. It also has a switching element and a high-layer-protocol decoding and control unit. Incoming traffic (either from the network or storage system) is decoded and compared against a routing table. If there is a matching entry, it will be routed, according to the information, to the network, the storage interface, or sent to the server for further processing (default). The routing table entries are set up by the server based on the nature of the applications when an application or user request initially comes in. Subsequently, barring any changes or errors, there will be no data exchange between the server and the device (although, a control message may still flow between them). There may also be a speed matching function between the network and storage, load balancing functions for servers, and flow control for priority and QoS purposes. Because the majority of data traffic will bypass the bus and the operating system (OS) of the server(s), the reliability and throughput can be significantly improved. Therefore, for a given capacity of a server, much more data traffic can be handled, thus making the system more scalable.