The demand for high-capacity data storage systems continues to rise. As the interconnection of data networks continues, there is an increasing demand to store very large numbers of files in an efficient fashion while at the same time enabling such a storage system to grow as the number of files increases.
While various conventional data storage systems are known, such approaches have not always been efficient, easy to scale, or cost effective. Conventionally, data storage systems have resided on a monolithic server. A monolithic server can be conceptualized as including a single, very powerful computing resource dedicated to accessing files that may be stored on a variety of media. Such a monolithic server can maintain a collection of metadata for the stored files.
Metadata can include assorted file information including a filename, directory in which the file is located, physical location (offset), size of file, and type of file. Conventionally, metadata can reside on single partition accessed by a process to enable rapid lookups in, and/or access to the metadata.
A drawback to the monolithic server approach can be the difficulty involved in adapting such systems to changing needs. For example, the number of stored files, and consequently the amount of metadata and metadata accesses may increase over time. To meet such needs, the monolithic server may be upgraded. While processing speed can be improved by increasing computing resources (such as the number of central processing units (CPUs) and associated random access memory (RAM)), such increased resources can be difficult to implement as hardware upgrades may require the system to be non-operational for a certain period of time.
Monolithic server approaches may be undesirable as usage requirements may be outgrown. As just two examples, the amount of data stored or the amount of requests serviced may grow to the point where an existing monolithic server response is too slow or not capable of meeting usage requirements.
One conventional approach to meet increasing requirements can be to add servers. A drawback to such an approach can be added complexity to a user. A user may have to keep track of the multiple servers, as such servers are typically visible as separate entities to user applications. Further, with multiple servers, load imbalance may occur as one server is accessed/stores more than another. Consequently, a system administrator may have to manually shift files and/or set request routing as usage changes. This can be an extreme burden on a system administrator.
It is also noted that the management of multiple servers can be especially difficult for mission critical or Internet applications that may run twenty-four hours a day and 365 days a year, as such systems do not typically have a window of time available to reconfigure or upgrade the system.
Increases in metadata size can be difficult to accommodate as well. As the demands for larger capacity systems increase (e.g., petabyte or larger size systems), the amount of metadata can increase as well. However, if the metadata exceeds the monolithic server's storage capacity, changes to the system may have to be undertaken to enable larger storage capabilities. Further, the manipulation of metadata (as files are deleted, renamed, moved, etc.) may become more complex as the server must be capable of accessing more and more metadata in the management process.
One approach to addressing the storing of a large number of files has been to “migrate” stored files. Migration of stored files may include transferring files from one storage medium to another. Typically, “old” files (those that are not accessed after a certain period of time) can be migrated from a first storage medium that may provide relatively fast access (and hence may be more expensive), to a second storage medium that may provide slower access (and hence may be less expensive).
While migration of files may provide a solution for larger numbers of data files, there remains a need to address the increasing size of metadata. For data storage systems that store a large number of files, there is a need for a metadata storage approach that allows for a high degree of scaling, and/or ease in scaling, and/or flexibility in the arrangement of metadata, and/or more cost effective storage of metadata.