1. The Field of the Invention
The invention relates to networking and data storage. More particularly, the invention relates to a system and method for policy-based data management on a distributed storage system.
2. The Relevant Art
Networks have become instrumental in situations in which data is transferred from one computer to another, or from clients such as an independent workstation to a centralized storage facility. It is common for storage applications to have very specialized needs. In response to these needs, distributed storage systems have been developed. One type of distributed storage system is a storage area network (SAN). A distributed storage system typically has a plurality of clients connected to a plurality of storage pools. The clients of the distributed storage system may, in some cases, be servers that transmit data between the distributed storage system and individual computers.
Unfortunately, a number of storage related issues have not yet been successfully addressed by known distributed storage system configurations. A distributed storage system is often called upon to carry out several different operations simultaneously. Consequently, the resources of the distributed storage system, or of a server connected to the distributed storage system, can easily become saturated, particularly when many users wish to simultaneously store, retrieve, or move data on the distributed storage system.
Additionally, many known distributed storage systems have no method of prioritizing operations. Consequently, a low-importance, high resource operation, such as a bulk file transfer, may preempt memory, caching space, input/output (I/O) bandwidth, processor capacity, or other resources that are needed for more important operations. Thus, performance of the more important operations is unnecessarily delayed.
Also, current distributed storage systems are not capable of storing data using prioritized operations within multiple platforms. Typically, all of the computers on a distributed storage system must have the same type of operating system. If data from multiple platforms are to be stored, the data must be routed through multiple distributed storage systems and stored in different locations.
Furthermore, known distributed storage systems generally do not permit a user to automatically select between multiple storage options when generating files. Nor do these systems account for the different requirements placed on these files. Specifically, different files may have different requirements for accessibility, disaster recoverability, retrieval speed, retrieval consistency, and storage format. Some files may need to be accessed by many people simultaneously, while others are only used rarely, by a single user. Some files are “mission critical,” and therefore must not be lost if hardware damage occurs; others are more expendable. Similarly, some files must be accessed rapidly and/or transferred at a consistent, rapid data transfer rate, while others do not require rapid access. Certain file types, such as database files, are advantageously stored in a “sparse” format that permits subsequent expansion, while other files can be densely packed together.
By the same token, great variation exists in the equipment available to store data. In general, greater capacity, greater access speed, higher throughput, and higher disaster recoverability equate to higher cost. Without a variety of options for data storage, some files are stored in a manner that provides insufficient performance, and others take up comparatively expensive storage capacity that provides an unnecessarily expensive level of performance.
Consequently, what is needed is a comparatively simple and versatile system, method, and apparatus for managing data in a network according to predetermined policies. What is particularly needed is a data management system, method, and apparatus that prioritize files within the network, with clients that operate based on a plurality of different operating platforms. Further, what is particularly needed is a data management system, method, and apparatus that intelligently stores files in storage pools with a variety of performance levels based policies and the nature of the storage pools. Such a system, apparatus, and method would be particularly desirable if implemented for distributed storage systems that service clients operating under heterogeneous platforms.