1. Field of the Invention
The present invention relates to a method, system, and program for retrieving a file from either a storage system or a server over a network.
2. Description of the Related Art
A person accesses the Internet or World Wide Web using an Internet Web browser, also known as an Hyper Text Transfer Protocol (HTTP) client, that can process Web pages coded in the Hyper Text Markup Language (HTML) format. The HTTP client sends a request message to a server over the Internet which includes HTTP server software to process the request message. The request message typically includes a Universal Resource Locator (URL) which specifies the path of the requested information on the Internet and the name of the file at the specified path. The HTTP server processes the request and returns the requested file at the URL if it is available as part of a response message to the requesting HTTP client.
The popularity of the World Wide Web has led to long communication delays to retrieve information from servers connected to the World Wide Web client/server architecture. Two of the primary causes of server and Internet communication delay are bandwidth constraints of the transmission medium between computer systems and latency in server processing time of file requests. Bandwidth constraints are exasperated as network traffic increases and reduces available bandwidth resources to handle further transmissions. Current techniques for addressing communication delay include increasing the bandwidth by developing more robust transmission medium and conserving bandwidth by using proxy servers. Proxy servers cache previously requested files retrieved from over a network, such as the Internet. The proxy server conserves network bandwidth by servicing requests for network files from the cache instead of consuming bandwidth to retrieve the requested file from over the network.
A caching proxy server is a server that provides an interface between client computers and the World Wide Web. A client that accesses the World Wide Web via a proxy server requests the web page from the proxy server. The proxy server processes the request by first determining whether the requested web page is maintained in local cache. If so, the proxy server returns the requested web page from the local cache. Otherwise, the proxy server acts to retrieve the requested web page over the World Wide Web network. The request may utilize a communication protocol such as the HTTP, file transfer protocol (FTP), etc. Upon retrieving the requested web page, the proxy server stores the web page in local cache and returns the page to the requesting client. Subsequent requests for the web page may be serviced from the cache. In this way, the proxy server conserves bandwidth by returning files from cache instead of network transmission lines. Returning requested data from local cache not only conserves bandwidth, but also improves retrieval speed if the retrieval time from the cache storage medium is faster than the network transmission time.
One problem with caching proxy servers is the storage limitations of the local cache. The proxy server must continuously delete files from the local cache to make room for more recently retrieved web pages. This in turn reduces the likelihood that the proxy server can return a file from cache and conserve bandwidth resources. Thus, there is a need in the art to provide an improved system for returning requested files from a cache to improve bandwidth savings over current proxy server techniques.
To overcome limitations in the art described above, preferred embodiments disclose a system, method, and program for accessing files maintained in a server that are capable of being accessed over a network. A request is received for a file maintained in the server. A determination is then made as to whether a copy of the requested file is stored in a storage system. The system then determines a delay time associated with retrieving the copy of the requested file from the storage system after determining that the storage system includes the copy of the requested file. A determination is then made as to whether the delay time exceeds a maximum delay time. The system retrieves the requested file from the storage system to return to the request after determining that the delay time does not exceed the maximum delay time. Alternatively, the system retrieves the requested file from the server over the network to return to the request after determining that the delay time exceeds the maximum delay time.
In further embodiments, the time to process all file requests comprises determining the delay time for each queued file request by adding (1) a set-up time estimating time needed to ready the storage system for data transfer operations and (2) a data transfer time to transfer the data of the requested file from the secondary storage. The delay times for each queued file request are added to estimate the time to process all file requests queued against the storage system.
Preferred embodiments provide a method to determine whether the time to access a requested file from local storage exceeds a maximum wait time. This method is particularly applicable to situations where a storage system, such as a tape library, is used to cache files retrieved from a server over a network. If the delay time to retrieve the file from the storage system exceeds the maximum wait time, then the file may be retrieved from the server over the network. In further embodiments, the network response time to retrieve the file from over the network may be determined. If the time to access the requested file exceeds the maximum wait time, but is still less than the network response time, then the request will be queued against the storage system. In this case, the storage system, although slower than the maximum wait time, is still faster than the network.
This method and system for determining whether to access a requested file from storage is particularly applicable to preferred embodiments where the storage or cache of files retrieved from servers over the network comprises a primary storage system and a slower secondary storage, such as a tape library. In preferred embodiments, the secondary storage substantially increases the cache storage size at a substantially lower cost per unit of storage than the primary storage. If the number of files migrated to the secondary storage results in lengthy queues of file requests against the secondary storage, then there may be a significant delay time to retrieve files from secondary storage. In such case, the method of the preferred embodiments bypasses the secondary storage and instead retrieves the requested file from over the network, thereby avoiding queue delays at the secondary storage.