1. Field of the Invention
The present invention relates to information storage and retrieval and, more particularly, to storage and accelerated retrieval of objects from an object storage device.
2. Description of the Related Art
The Internet is a resource for tremendous amounts of content information. As examples, the content information can include text, images or other types of data. A user normally accesses the Internet using a connection provided by a modem and a regular telephone line or ISDN line, or by a T1 line or leased line. The connection to the Internet may be indirectly through an Internet Service Provider (ISP) or more directly to the Internet (or World Wide Web). The higher the bandwidth supported by the connection, the better the responsiveness of the Internet to the user""s requests. For example, since a T1 line offers substantially greater bandwidth than does a 28.8 kbps modem and regular telephone line, the connection to the Internet provided by the T1 line will provide substantially faster responsiveness to a user""s requests than would the 28.8 kbps modem.
A uniform resource locator (URL) is a unique identifier for a particular content location in the Internet. Every object located on the Internet has an associated URL. Each URL is divided up into four pieces, protocol, host name and port, uniform resource identifier (URI), and query. Furthermore, the content actually served from a particular URL location can depend on associated factors, such as cookies and authorizations which are sent between the requesting client and the content server using the Hypertext Transfer Protocol (HTTP). HTTP is one of several Internet protocols for transporting data between client and servers. Other transport protocols include file transport protocols, gopher, file and WYSIWIG. An Internet host (e.g., content server) is an entity on the Internet which has either a domain name, an IP address or both, and which is capable of serving content to requesting clients. A URI is a particular location of data on the Internet host. A query is a text string passed from the client to the host which requests a particular piece of data from a particular URI location within the host. A valid URL request sent by a client must contain a protocol, a host and a URI. The query is optional.
Internet proxy servers have been used to allow multiple users to share Internet access through a common high speed connection. Examples of such Internet proxy servers are (1) WinGate available from Deerfield Communications Company and (2) Microsoft Proxy Server available from Microsoft Corporation. Shared connections facilitate providing firewalls to prevent unauthorized access into the user""s (e.g., corporate) internal computers. These shared connections can also provide the Hypertext Transfer Protocol (HTTP) caching to allow improved responsiveness. These Internet proxy servers can also provide site filtering to prevent user (behind the Internet proxy server) access to certain Internet sites.
HTTP caching operates to locally store frequently accessed Internet material so that it is quickly available when subsequently requested. HTTP caching is described in detail in the Hypertext Transfer Protocol (HTTP) document, version 1.1, which is hereby incorporated by reference. Such caching enables an organization to more efficiently share the bandwidth of the Internet connection.
Conventionally, the delivery over the Internet of similar data in different formats was achieved in several different ways. One way is to present the user with a low resolution version of an image but then allow the user to click on the image to request a higher resolution version. A second way is to allow for initially load-up a low resolution version of an image to the user while a higher resolution version loads in the background. A third approach is by content negotiation between content server and user so that a client can request a high or low resolution version from the content server at the same URL. The content negotiation can be achieved by communicating cookies and authorizations between the user and the content server or by a POST request to a dynamic URL located on the content server. A POST request is an HTTP mechanism that allows a browser to send data to a content server.
Content negotiation is known and described in the HTTP document, version 1.1. Content negotiation is the process in which a HTTP response to a HTTP request is chosen to be that which is most suitable, assuming of course that there is a choice. The content negotiation can be client-driven or server-driven. The content differences being negotiated can vary widely but are nevertheless stored on a content server on the Internet. As an example, the content differences could be different languages or different size images. In such a case, a client may negotiate on behalf of a user with a content server to receive smaller (e.g., less robust, lower quality, or smaller image area) images instead of commonly provided larger images from the server. If the content server can provide the smaller images, then the user is able to receive and display the information (e.g., images) faster than had their been no choice or no negotiation. Thus, in some cases, the negotiation facilitates improved bandwidth utilization and responsiveness of the Internet. One problem with the conventional approaches to negotiated content delivery over the Internet is that most content servers do not offer multiple versions of information content. As a result, content negotiation, even if available, is not supported by most content servers. There are also no standards on what types of versions or variations of information content a content server should make available. Consequently, content negotiation is difficult to obtain over the Internet.
One problem of all of these conventional approaches is that the content server is required to support and maintain both a high resolution and a low resolution version of an image or object. While both versions occupy the same logical location on the Internet (e.g., URL) the content server must allocate separate physical space for each version. Also, content providers must expend and utilize scarce resources to provide multiple versions of the same content.
Other problems are concerned with efficiently storing, using and managing the cache. Conventionally, a proxy server generally reserves or provides a single location in a cache for a single URL. In other words, images stored in the cache are identified only by URLs. However, such an approach ignores the fact that URLs are often modified by cookies or authorizations. See HTTP, version 1.1. This is a deficiency of the conventional approaches because cookies and authorizations impact the availability of images. Further, the storage of images within a cache for a proxy server has been conventionally done using a standard database (with either a relational database model or a hierarchical database model). The relational database model typically does not offer high-performance and developing a database system tends to require a lot of time and associated cost. In a hierarchical database model data is stored as nodes in a tree structure, and the edges of the tree are identified by particular keys which identify the location of the data within the database. The tree nodes store the data, and the tree edges which point to successive nodes facilitate retrieval of the data. While the hierarchical model is a faster model than the relational model, there is still sufficient overhead associated with the database engine being utilized. These problems associated with the conventional approaches are typically associated with the cost of the system and its performance.
Thus, there is a need for improved techniques for caching information content from a remote content server in a temporary storage device as well as for delivering information content from the temporary storage device to a user.
The invention pertains to techniques for storing objects (e.g., images) in and retrieving objects from a storage device (e.g., image store) in a rapid and efficient manner. The various aspects of the invention include: storage of an object in and retrieval of an object from the storage device with reference to an object locator together with state and permission information, use of a directory structure of a file system to efficiently provide database structure for storage of the objects, storage and retrieval of object states as attributes of associated files in the file system, storage and retrieval of multiple versions of objects, and multi-threaded management of the storage device.
The invention can be utilized in a variety of systems or apparatuses, but is particularly well suited for use in a proxy system for a network (e.g., the Internet) wherein the database storage stores an object identifier, state information and permission information. A proxy system operates to store (e.g., caching) object files in a storage device (e.g., image store) so that these object files are able to more quickly and efficiently be supplied to requestors coupled to the proxy system. A directory structure points to resulting directories in the storage device where one or more different versions of an object can be stored. When the storage device is used as a cache device that provides temporary storage, the management of the storage device operates to clean out the storage device to remove expired data as well as to prevent overflow.
The proxy system can also include an acceleration apparatus (e.g., an acceleration server). Such a proxy system is able to produce an accelerated version of content information from the network, cache the accelerated version (and possibly original versions) for subsequent requests for the same information content, and supply the accelerated version of the information content to a requesting user.
The invention can be implemented in numerous ways, including as a method, an apparatus, a system, or computer readable medium. Several embodiments of the invention are summarized below.
As a method for storing objects in an object storage device, one embodiment of the invention includes the operations of: receiving a uniform resource locator, state information and authorization information associated with a particular object to be stored in the object storage device; combining the uniform resource locator, the state information and the authorization information to obtain an object identification string; dividing the object identification string into a plurality of individual directories, the individual directories form a directory path to a resulting directory where the particular object is to be stored; and storing at least one version of the particular object in the resulting directory in the object storage device. Optionally, the invention can also include the operations of producing a second version on the particular object by reducing the size of the first version of the particular object; and subsequently storing the second version of the particular object in the resulting directory in the object storage device. As another option, the object storage device is a database, and at least a part of the database has a directory structure of a file system used with the object storage device.
As a method for storing an image in an image storage device, one embodiment of the invention includes the operations of: receiving URL and associated HTTP Request and HTTP Response information; parsing the HTTP Request information and the HTTP Response information to obtain cookies and authorizations contained therein; merging the cookies if related cookies are contained in the HTTP Request and the HTTP Response; forming an image identification string by combining the URL, the merged cookies and the authorizations; hashing the image identification string to produce a hash directory; replacing unpermitted characters in the image identification string with predetermined replacements; dividing the image identification string to form a directory path having a series of individual directories; forming the individual directories of the directory path in the image storage device to the extent not already present; and storing at least one file in a resulting directory identified by the directory path.
As a proxy system for accelerated delivery of objects to a requester, one embodiment of the invention includes: an object storage device for storing objects, and a proxy server coupled between the requester""s computer and a network of computers. The object storage device uses a file system that supports directories to store the objects in files at directory locations within said object storage device. The directory location where each object is to be stored is identified by an object locator associated with the network together with state and authorization information. The proxy server intercepts a request for an object from the requester""s computer to the network of computers, and then satisfies the request by delivering the object requested from said object storage device to the requestor""s computer. The object requested from said object storage device is retrieved from said object storage device using a combination of an object locator obtained from the request together with state and authorization information associated with the request. Optionally, the proxy system can also include an acceleration unit to produce an accelerated version of certain of the objects stored in said object storage device. As another option, the directory structure within said object storage device can be used to implement a database. As still another option, the combination of the object locator together with state and authorization information identify a unique slot in said object storage device, and each of the slots can store files pertaining to a different version of an associated object.
As a proxy system for accelerated delivery of objects to a requester, another embodiment of the invention includes: an object storage device for storing objects, and a proxy server coupled between the requester""s computer and a network of computers. The object storage device uses a file system that supports directories to store the objects in files at directory locations within said object storage device. The directory location where each object is to be stored is identified by at least an object locator. The proxy server intercepts a request for an object from the requester""s computer to the network of computers, and then satisfies the request by delivering the object requested from said object storage device to the requestor""s computer. The object requested from said object storage device is retrieved from said object storage device using an object locator obtained from the request. Optionally, the object storage device also stores a current state for each of the files. As another option, the directory structure within said object storage device implements a database. As still another option, the proxy system can include an acceleration unit that produces an accelerated version of certain of the objects stored in said object storage device.
As a proxy system for temporarily storing objects previously requested by a requester, another embodiment of the invention includes: an object storage device for storing objects, a proxy server coupled between the requester""s computer and a network of computers, and an object storage device cleaner. The proxy server intercepts a request for an object from the requester""s computer to the network of computers, and then satisfies the request by delivering the object requested from the object storage device to the requestor""s computer. The object requested from the object storage device is retrieved from the object storage device using an object locator obtained from the request. The object storage device cleaner operates to clean out from the object storage device those objects that are aged using a plurality of concurrent processes or threads that operate on different criterion in cleaning out the object storage device.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.