This invention relates to efficiently storing and retrieving a large number of data objects, and more particularly to an efficient data object management scheme that reduces overhead associated with metadata of the data objects.
Many large-scale applications require storage and retrieval of a large number of data objects. As the number of data objects stored in a file server is increased, the amount of the metadata used by the file server for file access is increased proportionally. Since the amount of metadata per file is generally constant, the overhead for metadata is exacerbated when the size of the data objects is relatively small. Thus a large number of very small files require significantly more metadata than a few large files, even if the total size for the stored files is the same.
As the number of data files in a file server increases, the metadata file typically becomes too large to be held in primary storage (e.g., memory). Such a large metadata file must, therefore, be stored in secondary storage (e.g., hard disks). As a result, to retrieve an arbitrary data object, multiple input/output (I/O) operations typically must be performed on the secondary storage to locate and retrieve first the metadata, and then the data object. The increased number of I/O operations on the secondary storage and relatively slow access speed of the secondary storage significantly increase the retrieval time of the data object.
An online photo storage application is an example of a large scale application that involves a large number of data objects with relatively small size, typically less than 1 Mb each, and frequently as small as a few hundred kilobytes. Users of photo storage application often upload image files (e.g., photograph files) for sharing with other users over Internet. Typically the uploaded image files are seldom deleted. As a result, the number of photos steadily increases over time. In some photo sharing applications, the total number of stored image files can reach into the billions, with the total amount of stored data being in the petabytes.
Conventional file systems do not scale well to such a large number of data objects. For example, a POSIX-compliant file system requires the following metadata for each file: file length, ID, storage block pointers, file owner, group owner, access rights, change time, modification time, last access time and reference counts. The large number of fields in a POSIX-compliant file system makes it difficult to store the metadata associated with a very large number of files in primary storage. Hence, the metadata in conventional file systems are often stored in secondary storage.
FIG. 1A is a functional block diagram illustrating the process of uploading image data objects in a conventional online photo sharing application. A photo upload server 108 receives image data objects embedded in HTTP messages from clients 104. The photo upload server 108 then stores the data objects in one or more of the storage servers 110 using, for example, NFS (Network File System) protocol. The storage servers 110 receives requests based on the NFS protocol and stores the image data object as a file using conventional file system. Each storage server 110 uses a POSIX compliant file system, and thus stores metadata for each image (e.g., an inode).
FIG. 1B is a functional block diagram illustrating the process of retrieving and sending image data objects to the clients 104. A HTTP request for an image file from one of the clients 104 is received at a content delivery network (CDN) 128 or a caching server 132. The request identifies the image file by its file name. If the requested image file is not cached in the CDN 128 or the caching server 132, the CDN 128 or the caching server 132 forwards the HTTP request to one of the content servers 116. After receiving the HTTP request, the requested content server 116 uses the file name to determine which of the storage server 110 stores the requested image file, and then translates the HTTP request into an NFS command to that storage server 110. The storage server 110 typically accesses the stored metadata based on the file name to determine the disk location information for the image file. This access will typically be to secondary storage, rather than to primary storage. The storage server 110 then retrieves the image file from disk, and passes it back to the content server 116. The content server 116 sends the retrieved image file to the requesting client via the CDN 128 or the caching server 132.
As can be seen from these examples, there is significant overhead in both the multiple disk access for both the metadata and the file data. There is also overhead that comes from the use of two protocols, HTTP and NFS, and in the addition operations needed by the content server to translate between protocols. Thus, it would be beneficial to have a system and method for efficiently storing and retrieving a large number of data objects, and more particularly to an efficient data object management scheme that reduces overhead associated with metadata of the data objects.