1. Field of the Invention
The present invention relates to distributed digital data storage systems. More particularly, the invention concerns a file management technique for a distributed file system with multiple storage units called xe2x80x9caggregates.xe2x80x9d A xe2x80x9crecognitionxe2x80x9d module analyzes the overall storage contents of each aggregate, then performs a virtual synchronous data move operation to reconfigure the contents of that storage aggregate. Physical data moves are actually buffered for asynchronous performance by an xe2x80x9caction modulexe2x80x9d in accordance with certain data movement rules.
2. Description of the Related Art
A distributed file system xe2x80x9ccellxe2x80x9d includes many file servers, each containing many storage areas that are called xe2x80x9caggregates.xe2x80x9d An aggregate may correspond to one or more logical or physical devices (such as magnetic xe2x80x9chardxe2x80x9d disk drives), or a portion thereof. Each aggregate contains many smaller units of user data, called xe2x80x9cfilesets.xe2x80x9d For ease of reference, the term xe2x80x9cfile managementxe2x80x9d is used herein to include the management of data such as filesets and files.
Distributed file system cells can be quite large, including thousands of filesets and hundreds of aggregates. Consequently, data management workload can be substantial. The data management process, which occurs repeatedly and dynamically, depends upon many different factors, which frequently conflict. For example, care must be taken to avoid optimization strategies that repeatedly move filesets back and forth, known as xe2x80x9cchurning.xe2x80x9d Another pitfall is over-allocating storage space for one or more of a file server""s aggregates, causing the file server to run out of storage space. Another potential problem is storing too many filesets that receive too many simultaneous accesses in a file server""s aggregates, potentially overloading the file server.
Data management, then, is a complicated process that must consider a number of competing factors. Often, data management is performed by a human administrator initially assigning aggregates or subparts to specific applications or clients, and then analyzing the cell contents and moving filesets as needed. Sometimes, administrators employ automation scripts to apply simple space management rules for the purpose of rearranging filesets. Even so, the data management process requires a substantial amount of manual supervision and command execution. Some may believe that file management is most accurately and effectively done with the human touch, since the automated file management tools are insufficiently comprehensive. Others, however, may encounter some frustration with the labor costs, time consumption, and possible mistakes of this type of file management. In any case, certain unsolved problems prevent known file management techniques from being completely adequate.
Broadly, the present invention concerns a file management technique for a distributed file system with multiple storage xe2x80x9caggregates.xe2x80x9d For each aggregate, a xe2x80x9crecognitionxe2x80x9d module analyzes the overall storage contents, then performs a virtual synchronous data move operation to reconfigure the contents of that storage aggregate according to certain prescribed xe2x80x9cgoalsxe2x80x9d Physical data moves are actually buffered for asynchronous performance by an xe2x80x9caction modulexe2x80x9d in accordance with certain data movement xe2x80x9crules.xe2x80x9d Thus, the cell can be virtually manipulated towards its ideal state before the corresponding physical activity is completed.
These operations are described in greater detail as follows. For a first aggregate, the system starts by reviewing files contained on the first aggregate, along with storage statistics concerning of all aggregates, and planning a set of physical data movement operations to configure data stored upon the first aggregate according to certain prescribed goals. Since the system actually buffers the planned physical data movement operations, the planned set of physical data movement operations may be considered as a xe2x80x9cvirtualxe2x80x9d data move. The reviewing and planning operations may employ an expert system, for example. The reviewing, planning, and buffering operations are repeated for all remaining aggregates in the distributed file system. However, each repeated reviewing operation considers the file contents of the aggregates as if all previous virtual data moves had actually been performed.
Asynchronously with the foregoing storage analysis and reconfiguration planning, the system processes the contents of the action buffer. Namely, the system reads buffer contents according to a prescribed order, consults predetermined action rules, and then carries out the physical data moves in accordance with the action rules. The action rules may specify desirable hours to move data or avoid moving data, for example.
The foregoing features may be implemented in a number of different forms. For example, the invention may be implemented to provide a method of automatically managing files in a distributed file system by decoupled data analysis and movement operations. In another embodiment, the invention may be implemented to provide an apparatus, such as a distributed file system configured to perform file management as described herein. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform file management tasks as explained herein. Another embodiment concerns logic circuitry having multiple interconnected electrically conductive elements configured to perform the subject file management operations.
The invention affords its users with a number of distinct advantages. For example, computer-driven file management reduces or avoids the workload and costs incurred by human computer administrators. And, the automated file manager can easily operate twenty-four hours each day without complaint. Furthermore, automated file management offers greater accuracy since it is not subject to fatigue and calculation errors of human operators. Furthermore, the automated file manager can conduct more complicated and exhaustive analysis of storage contents, since it uses high speed computing machines.
Another advantage of this invention is that the recognition and action modules are decoupled. Thus, analysis and action can run at different times, even on different machines. Furthermore, even though data analysis may occur during normal hours, data movement can be performed after-hours without requiring a human administrator to work at such times. If desired, move requests generated by the recognition module can be analyzed or screened by an administrator before they are implemented. Moreover, with decoupled recognition/action, pending requests can be prioritized (and re-prioritized) after their entry into the action buffer by the recognition module. Finally, decoupling of recognition and action modules facilitates the manual generation of move requests unrelated to operation of the recognition module. Such requests are carried out by the action module automatically, right along with other requests generated by the recognition module.
In addition, the present invention offers concurrent, multi-factor analysis, providing more efficient allocation of filesets than possible with human administrators or simple single-factor automation scripts. For example, the invention considers varied and complex parameters such as ratios of free/used space, server CPU load, aggregate I/O load, fileset access history, etc.
Beneficially, the recognition module""s goals may be chosen to allow the encoding of business rules for data management. Value parameters, for example, facilitate the setting of specific thresholds (e.g., 85% full as maximum aggregate capacity). Additionally, ratio parameters allow functional tradeoffs to be made (e.g., the ideal ratio of CPU load versus space used for a particular aggregate). Also, prioritization parameters allow priority tradeoffs to be made (e.g., the importance of maintaining an adequate amount of free space versus clearing off data from a particular server).
Unlike automation scripts, where parallelism is difficult to achieve and manage, the present invention conducts managed parallel data movement. This speeds the implementation of desired space management actions and maximizes the available data movement windows without overloading the servers. The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.