1. Field of Invention
The present invention relates to searching for files that match a specified set of attribute criteria in a data storage system. Exemplary attribute criteria could include file modification time, size, ownership, and permissions.
2. Description of Related Art
Accessing files in large data storage systems can be very time consuming. For example, consider a ubiquitous data storage system that includes a magnetic disk drive. This disk drive can include a rotating magnetic platter that has a read/write head suspended above the platter. Thus, two latencies are associated with reading this disk drive, i.e. a first latency to spin the platter to the correct location and a second latency to reposition the head (and move to a different track).
For this reason, the read time associated with a particular file is dependent on the location of the last read file on the disk drive. That is, if the two files are closely located on the disk drive, then the read time of the second file is relatively short. On the other hand, if the two files are remotely located on the disk drive, then the read time of the second file is relatively long.
In a typical disk drive, files are initially generated based on their location in the naming hierarchy. However, as files are modified or deleted, the order of the files is increasingly dependent on the available space on the disk drive. Moreover, large files may need to be divided into multiple files that are stored in different locations.
A data storage system must store references to all files to facilitate writing to and reading each file. These references include meta data structures called identification nodes (inodes). Notably, each file has an associated inode.
FIG. 1 illustrates a simplified inode 100 that identifies its inode number 101 and includes meta data 102 as well as a disk address 103. Meta data 102 can include a plurality of file attributes, e.g. last modification time, file size, ownership, and permissions. Disk address 103 identifies the physical location of a data block in disk drive 105. This data block includes the file corresponding to inode 100. In this example, disk address 103 identifies a data block 104.
Logically, after these inodes are created, then references to such inodes must also be generated. A conventional data storage system uses a user directory to map file names to inode numbers. FIG. 2 illustrates a simplified user directory 201 including a plurality of file records 202-204. Note that a directory is a special type of file that uses the same structure described in FIG. 1 to store the list of records it contains. An exemplary file record 202 includes a file name 205 and an inode number 206. Note that an inode number refers to only one inode.
Although each file has only one inode number (and thus only one inode), it is possible that a file can have multiple names. For example, consider a data storage system for email, wherein each received email generates a file associated with a recipient. In this data storage system, if an email is sent to multiple users, e.g. Bob and Susan, then that email may be named “1.” for Bob, thereby indicating that this is the first email for Bob during a predetermined time period. In contrast, that same email may be labeled “6.” for Susan, thereby indicating that this is the sixth email for Susan during the same predetermined time period. However, because the email sent to both Bob and Susan is identical, only one inode number (and thus only one inode) need be saved in the data storage system.
A conventional data storage system uses a “file tree” to organize user directories. FIG. 3 illustrates a file tree 300 (also called a naming hierarchy in the data storage industry). A first level of file tree 300, i.e. level 301, includes a high-level directory (i.e. “/”). Level 301 is also called a “root” in the tree hierarchy. In a typical embodiment of a UNIX filesystem, this high-level directory has an inode number “2” (i.e. “0” and “1” are not used).
A second level of file tree 300, i.e. level 302, includes user directories. In this case, three user directories are shown: Bob's directory (i.e. “A”), Susan's directory (i.e. “B”), and Pat's directory (i.e. “C”). Note that each user directory also has an inode number that was generated when that user directory was created. Thus, user directory “A” could have an inode number “20” whereas user directory “B” could have an inode number “120”. Each user directory in level 302 is called a branch in the naming hierarchy.
A third level of file tree 300, i.e. level 303, includes files within a user directory. In this embodiment, user directory A includes file names a., b., and c.; user directory B includes file names 1., 2., 3., 4., 5., and 6.; and user directory C includes file names a. and b. Note that the names of files and a user directory may be specified by the user or an application using any supported character set. In either case, the naming convention is consistent for each user directory. As indicated above, files with different names (e.g. /A/a. for Bob and /B/6. for Susan) may have the same inode number. The file names used in file tree 300 are typical in an email storage system that uses system-generated names (as opposed to user-assigned names). Each file in level 303 is called a “leaf” in the tree hierarchy.
Note that file tree 300 is representative only. The actual format of file tree 300 in a data storage system would typically conform to that shown in FIG. 2. For example, a directory at the root level can be similarly shown with user directories being shown as records. Thus, each record in file tree 300 (i.e. at levels 301, 302, and 303) would also include an inode number, which is not shown for simplicity in FIG. 3.
One typical access in a data storage system is requesting to see all files that were created since a certain time/date. In the context of an email storage system, the request might be for all email that came since yesterday. In UNIX filesystems, the “find” utility tool can implement this request.
In a conventional implementation, the search begins at level 301 and then walks down each branch at level 302 to the leaves at level 303. Specifically, a scan can be performed based on the order of records in the high level and user level directories. For example, assuming that the order of records in the high level directory is “A”, “B”, and “C”, then “A” and each of its constituent files (e.g. “a.”, “b.”, and “c.”) would be scanned first, then “B” and each of its constituent files (e.g. “1.”, “2.”, “3.”, “4.”, “5.”, and “6.”) would be scanned second, and “C” and each of its constituent files (e.g. “a.” and “b.”) would be scanned third. The above-referenced scanning includes looking at the inode of each entry to determine if the modification time (stored as meta data in the inode) is greater than yesterday's date.
Both system size and inode locations can adversely impact the access time of inodes. Specifically, many data storage systems are increasingly storing huge amounts of data. For example, a typical email storage system could include 100 million inodes and even more names.
Unfortunately, the conventional technique of walking this huge file tree, fetching the appropriate attribute for each entry in the file tree, and then comparing that attribute with the specified criteria results in an essentially random access of inodes on the disk drive. Note that inodes are typically scattered in chunks throughout a disk drive (e.g. a typical organization could have 8,000 inode in a 64 MB chunk). Thus, a walk of the file tree in a large data storage system is a non-scalable implementation. For example, the above-described email storage system could run for hours to implement a simple modification time request.
Therefore, a need arises for a file search technique that scales well to large data storage systems.