1. Field of the Invention
The present invention relates generally to data storage, and more particularly, to an improved technique for file storage to improve system performance.
2. Description of the Related Art
File fragmentation is a well-known problem in computer data storage. In a typical file system, files are stored in units called clusters. When files are initially stored on a new storage device, it may generally be possible to store them in a set of one or more contiguous clusters that start at one physical location. However, as files are modified, added and removed it may no longer be possible to store a particular file in one location, and the file is broken up and stored in multiple logically linked locations. This is called file fragmentation. In some instances, the physical data within a single file may be stored in several hundred or more different locations, generally requiring a storage device to stop at the end of each “fragment” and relocate its reading apparatus to the beginning of the next fragment this same number of times in order to process the entire file.
As noted above, files are generally stored within units of physical space on a storage device known as clusters (or blocks in some file systems). The uniform data capacity of each cluster is determined by the file system. Typically, each cluster will only contain data from a single discrete file (if it contains data at all) and not data from more than one file, even if the total size of the data within the file is less than the data capacity of a single cluster. Unused clusters that have never had data written to them or clusters where data has been logically deleted are known as “free space.” In addition, when two files are placed near each other on the disk, but leave one or more entire clusters between them, the empty clusters also form “free space.” For the purposes of this document, and as understood by the current state of the art in file systems, the unused fractions of a cluster (sometimes called “slack space”) are not considered free space.
When many empty clusters are available contiguously on the disk, large files may be written to them in a single contiguous sequence. However, as files are modified, added and removed numerous physical locations containing free space on the storage device become too small to contiguously store files beyond a certain size, even though the total available space may be sufficient. This is called free space fragmentation.
File fragmentation can cause significant system performance degradation, especially when utilizing mechanical storage devices such as magnetic hard drives, since it will require numerous mechanical head repositions to read a single file on the device. Moreover, free space fragmentation can cause significant performance degradation when writing to the storage device, as the file system must split large files across multiple slivers of available disk space. When utilizing mechanical storage devices such as magnetic hard-drives, this necessitates moving the drive head to multiple locations while writing. A file written to free space that is fragmented will necessarily be fragmented, and this is a common root cause of file fragmentation.
To alleviate these problems, software programs have been developed to “de-fragment” a storage device. These programs generally operate to re-arrange the file clusters so that each file is stored contiguously in a single location, and to consolidate, to the extent possible, the small, free space locations. However, this approach only temporarily improves system performance to a limited degree, since the storage device will soon become fragmented again, and only improves performance with respect to an individual file, and not with respect to a plurality of related files, such as all the files that are read when starting a specific software application.
For example, as described in U.S. Pat. No. 5,398,142:                Many commercial products attempt to address and overcome the problem of file fragmentation. One such product is PC Tools Deluxe™ by Central Point Software, Inc. The software package includes a feature which arranges the files on a hard disk or diskette such that each file is contained in one contiguous area. Another feature unfragments files and moves free space to the back of the disk. The software also permits files to be arranged in a predetermined manner on the hard disk. For example, all files for a given subdirectory may be kept together to keep data and program overlay files adjacent one another. This reduces the amount of disk head movement needed. A directory sort feature permits the files within directories to be sorted by file name, file time, file extension, or file size. Information regarding these features may be found in PC Tools Deluxe, Version 5, December 1988. However, this and similar products arrange free space at the back of the disk. As described above, as files are created and deleted, contiguous free space deteriorates and the fragmentation problems return.        
To further improve system performance, specifically during boot time, the Microsoft Windows™ operating system includes a component called Prefetcher. As described in U.S. Pat. No. 6,633,968, the Prefetcher works by monitoring the executable code and data that is accessed during the boot process, and recording a log file of this activity. Prefetcher then uses a predictive, probabilistic algorithm applied to the information recorded in the log file to load the code and data in a more optimal fashion. In addition, a defragmenter is used to place the particular executable files loaded by the Prefetcher on a particular location and in a particular order on the storage device. For example, the files may be stored on a reserved, higher-speed access area of the storage device. One disadvantage of the Prefetcher algorithm is that certain files may continually be loaded during the boot process, even if at some point they are no longer used by the user.
Another technique for improving system performance is disclosed in U.S. Patent Application Publication No. 2008/0027905. Files to be stored on one or more storage devices are classified into “rankings” of different sets. Differences in retrieval value of different regions are exploited by selecting which files to store in which to regions. For example, files having a higher classification are stored in regions with faster retrieval times. This, and similar techniques, are generally referred to as “disk optimization.” Some disk optimization techniques track frequently-used files to apply priority rankings. Higher priority files are written to the fastest region, and lower priority files are written to slower regions.
As discussed above, optimizing a storage device merely by moving certain files to a specific location on the device may provide some performance increase in some cases. For example, if a group of document files were all relocated to a specific physical location on a storage device, a performance increase may be realized if the files are viewed one after another. However, if the user wants to edit the same files with a document editing application, its program application files, their dependencies, and associated tools which all must be physically read prior to using the application may be physically scattered in a disorganized and unpredictable manner around the storage device (for the purposes of this document, this is referred to as “data entropy”), resulting in an exorbitant number of times the storage device is required to stop and reposition its reading apparatus as it reaches the end of one file and then locates the next file to be processed within the overall execution sequence. This “data entropy” within the storage device may have the effect of partially or entirely negating any performance improvements gained by grouping only the document files to be edited together, even if there is no file or free-space fragmentation. Such data entropy within the physical organization structure of discrete data entities on a storage device can result in a significant level of performance loss whether or not file or free space fragmentation exists.
As further described in U.S. Patent Pub. No 2008/0027905, data may also be grouped:
Classifying Files Based on Data Grouping                Grouping of files or parts of files together on a certain area of the storage medium can also significantly increase performance. For example if you launch a word processor, several different files and/or portions of files are loaded, additionally data may also be loaded at launch. If the computer has to go to several different areas of the disk to load the necessary files or parts of files it will take substantially longer than if all these files were in the same area on this disk. Further grouping folders (directories) with the files they contain will also speed up the computer system under certain circumstances. In order to determine what files to group we merely kept track of what files are read from the disk in sequence. When we can confirm a pattern then those files or portions of files are grouped together.However, it has not been known to organize all files or data across an entire storage device, in a systematic fashion to improve system performance, without monitoring usage beforehand.        