For many organizations, managing data storage resources is a time consuming process which is frequently prone to error. Data storage systems can quickly fill with redundant or outdated files which are occasionally purged from the system. Frequently, it is difficult to determine the ownership of the files being purged or whether those files are actually still needed. This results in mistakenly removing needed files or allowing outdated or redundant files to remain on the data storage array. Furthermore, it is a time consuming process for a person to examine these files and try to determine which files should be purged and which files should remain on the storage array.
For example, software development is frequently performed by teams of developers working in geographically separated locations. These development teams will use a centralized data storage area for storing the particular software component they are working on. Typically, these components resemble directories having hundreds of subdirectories and files. Other software development teams can access this data to ensure their components work together, to import dependencies, as well as in the final integration of the software components.
During the development cycle, there are frequent updates to the components, as well as patches of previous component versions, and/or multiple component versions to allow for the dependencies of other software components. Thus, in a short amount of time the centralized data storage area can fill with redundant or outdated files which must be purged. Occasionally, older files may be extremely stable components and are still considered the most current version. This complicates manually selecting and purging files because the age of the file is not an accurate indicator of whether the file should be purged. Additionally, this complicates identifying files which may be candidates for archival.
Often a list is distributed of the files which are to be purged. This allows developers to identify files which should remain in the centralized data storage area. However, it is a time consuming task for the developers to search the data component hierarchy to determine which files should be purged. This is increasingly true as directory structures become larger and more complex.
Thus, prior art methods for managing centralized data storage systems are time consuming and prone to error. Specifically, there is no method for automatically identifying files which are candidates for archival or removal from the data storage array.
Additionally, there is no method for consolidating files across multiple storage devices (e.g., multiple disk drives in a data storage array). Currently, compacting data in data storage arrays involves compacting each data storage device in the data storage array as a separate resulting in haphazard distribution of available free space among the separate devices. This results in less than optimal utilization of the free space in the data storage array. For example, collectively the free space in the data storage array may be large enough to accommodate large files and directories. However, prior art methods for compacting data result in multiple smaller areas of free space which, individually, may not be large enough to accommodate the larger files.