Network based storage, or simply “network storage”, is a common approach to backing up data, making large amounts of data accessible to multiple users, and other purposes. In a network storage environment, a storage server makes data available to client (host) systems by presenting or exporting to the clients one or more logical containers of data. There are various forms of network storage, including network attached storage (NAS) and storage area network (SAN). In a NAS context, a storage server services file-level requests from clients, whereas in a SAN context a storage server services block-level requests. Some storage servers are capable of servicing both file-level requests and block-level requests.
There are several recent trends in network storage technology. The first is that the amount of data being stored within a typical enterprise is approximately doubling from year to year. Second, there are now multiple classes of storage devices available on the market today, each with its own performance characteristics. These two trends together have caused users to want storage systems that mix different kinds of storage in such a way that it is possible to seamlessly move data across storage tiers, based on policies, for example.
In addition, users often would like to apply policies to collections of data objects. For example, an online social networking site/service might want to replicate all of its original size photos (e.g., photos of its members/users) three times, but not the thumbnail versions, since the thumbnails can be recreated from the originals. Yet today, setting policy within a storage system is a cumbersome process that has to be done out-of-band by a system administrator. Application writers and users cannot specify policies on groups of files/objects.
A problem associated with conventional storage systems is that the use of path names, such as in a traditional filesystem, imposes a hierarchical organization on the data, to which applications need to conform and use for different purposes, such as navigation and retrieval, access control, and data management. However, a hierarchical organization may not make sense for uses other than navigation and retrieval, and as a result, it can lead to inefficiencies such as duplication of content and consequent administrative overhead.
Furthermore, a hierarchical organization has also proven to be ineffective for navigation and retrieval. Consider a photo that is stored under a given path name, such as “/home/eng/myname/office.jpeg”. In a traditional storage system, this name maps to a specific server/controller, a specific volume and a specific file location (e.g., inode number) within that volume. Thus, path names are tied to storage location.
These problems and others are addressed by a network storage system described in U.S. patent application Ser. No. 12/633,718 of G. Goodson et al., filed on Dec. 8, 2009 and entitled, “Content Repository Implemented in a Network Storage Server System” (hereinafter “Goodson”). The network storage system described in Goodson provides a content repository, which includes a distributed object store, a presentation layer, a metadata subsystem, and a policy-based management subsystem. The system can be implemented in a multi-node storage server cluster. The distributed object store described in Goodson stores data objects and is distributed across multiple interconnected storage server nodes, such as may exist in a clustered storage server system.
While such a system solves many problems, it also gives rise to various technical challenges. One of those challenges is how to allow efficient search and retrieval of data objects by users, particularly when the user does not know the name or identifier of the data object(s) of interest. To allow this and other functionality, an advanced metadata subsystem is needed to allow for full-featured creation and management of metadata for stored data objects.