Storage appliances may be configured to concurrently provide multiple storage services. For example, a storage appliance may internally provide access to objects using a file-oriented application programming interface (API) to support network attached-storage (NAS) protocols such as a network file system (NFS) and common internet file system (CIFS) as well as an Object Get/Put API (to support Cloud Storage protocols such as Amazon S3, OpenStack Swift or Hadoop Distributed File System).
Conventional solutions allow these services to co-exist, but with little or no consolidation of the underlying storage resources. File and object services will typically both map to the local file system, but they will also map to separate portions of it. Many conventional object services use databases to store account and/or container information about objects, rather than files directly. However, since the databases are implemented using files the object service is still using the local file system.
In conventional solutions, when the same document is stored using both file and object services, the same content will typically be recorded twice. The same set of account and container related information will be encoded first for file access and will be encoded a second time for object access. Policies about which set of users are allowed to access which set of documents will typically have to be expressed in both file and object terms.
The conventional solution to having a single storage appliance provide file and object services is to simply enable both services, but to access different storage resources for each, and then to enable access independently (and redundantly). File services would export portions of the local file system with access control lists (ACLs) encoded directly in the file system, while object services would encode objects in non-exported portions of the local file system and typically encode ACLs either in the local file system and/or a database. Conventional operating systems provide file-oriented access to persistent data. Various libraries in user and/or kernel spaces provide cloud-oriented get/put access to persistent objects.
Conventional solutions are also poorly structured to provide for deduplication of objects. Block layer compression and deduplication features are conventionally limited to a single storage server because the block layer is unaware of payload stored by other storage servers. Further, in conventional solutions indexing to detect duplication is an additional burden placed on each storage server that may consume extra resources. These resources may have been better utilized for other storage related functions such as additional caching.
Objects are made accessible to file-oriented clients as conventional files, typically in full or nearly complete compliance with the portable operating system interface (POSIX) standards for file access. This enables access by an installed software base that includes NAS clients but not cloud storage clients, and when applications require NAS semantics for a truly shared update of an object. Cloud storage clients can be used to access those same objects under cloud storage protocol semantics for the majority of applications that do not truly require concurrent file sharing semantics.
Conventional solutions have positioned cloud objects as second class entities. The cloud objects are ultimately encoded as conventional files on specific object servers after the cloud layer determines which server should hold each object. Traditional file systems were designed before the cloud access paradigm. Even when the local file system is capable of storing metadata for cloud objects the system is not optimized for cloud access patterns.
Many cloud storage systems organize storage of the payload of objects as chunks. This enables parallel transfer of payload and allows the payload of a single large object to be spread over multiple storage servers. Cloud storage APIs have been designed with more concern for supporting versioning of files. With many cloud storage APIs, a client can request the current version of an object and be directed to access chunks which are stored on multiple storage servers. Without a separation of metadata and chunk storage, all versions of any object would need to be stored on the same set of storage servers or older versions would have to be migrated from their initial location after they were no longer the current version.
Chunk servers are storage servers which store chunks. The chunks are referenced in metadata for objects managed by other storage servers known variously as metadata servers and namenodes. Chunk servers are a common element in existing pNFS and cloud storage systems. A distributed metadata system and/or a metadata server describe an object to a client as one or more chunks. The client then obtains the portions of the objects by getting the chunks from chunk servers. The specific chunk server (or servers) may be specified by a hashing algorithm or explicitly by the metadata server. In conventional implementations, each chunk server typically stores the chunks assigned to it as files within a local file system, such as EXT4 or XFS.
Conventionally, computer systems organize persistent storage resources either as file systems or as volumes. Neither of these solutions is well suited for the storage and retrieval of chunks. The encoding used for file systems allow for a large number of options that are simply irrelevant for stored chunks. These include:
The mapping of a file to a specific set of blocks is designed to accommodate changes in the total size of a file.
Mappings are not optimized for a specific size because the size of a file can be changed by any operation.
A mapping that was optimized for a total file size less than a predetermined size, for example, less than 1 MB would have to be totally re-written as soon as the file exceeded 1 MB in size.
Chunks have a fixed size that is set when they are first created.
A chunk may be deleted, but it is never expanded or contracted.
Additionally, the need to provide transactional integrity forces conventional file system designs to support at least a limited form of versioning where the blocks referenced by an existing file handle opened for read must be preserved until the file handle is closed even after the blocks have been replaced as the current content of the file. Furthermore, files are conventionally identified with file names which are composed of printable characters and organized in hierarchical directories.
Conventional volumes support random writes. Many conventional designs also allow for thin provisioning. Both of these features require the data layout of a volume to support rewritable data and dynamic changes to the set of blocks that comprise the volume. By contrast, chunks do not dynamically change size and are never rewritten after creation.
Further, conventional storage systems support a small number of volumes. Systems may have expanded towards supporting hundreds or thousands of volumes, but using a volume to encode chunks would not be optimal when the number of chunks will typically be measured in millions or billions.
Storage servers are frequently federated so that a group of storage servers collaborate to provide a unified namespace for a set of objects. This is typically the case for file-oriented storage servers, and nearly always the case for cloud storage servers. Once multiple storage servers are federated, two issues must be addressed: how is new information replicated through the set of storage servers, and how can a client accessing this data know that it is receiving the most recent information.
Conventional solutions allow for persistent files supporting an object to be updated at different times on different storage servers. This necessitates complex logic to ensure that the most current information is effective. Existing solutions that rely on metadata servers, such as Hadoop Distributed File System and pNFS, require the metadata server to track the status of distributed images, co-ordinate updates of the images and to only steer clients to copies that are current. Requiring centralized metadata limits the scalability of the storage system.
Solutions such as OpenStack object storage (codenamed Swift) avoid creating a central metadata server by requiring all access to be performed by a set of proxy servers. The set of proxy servers are relied upon to implement uniform update strategies where each transaction is completed only after a majority of the active physical copies of any object have been successfully updated. This will have a severe performance impact if the storage servers are not located within a single site, and precludes retaining of extra copies for opportunistic caching.
What is needed is a system and method that addresses the above-identified issues. The present invention meets such a need.