1. Field of the Invention
The present invention relates generally to data storage, and more particularly to accessing file systems in data storage. Specifically, the present invention relates to searching a file system directory.
2. Background Art
Files are typically stored on disk in a hierarchical data structure called a file system. The file system includes a root directory and objects such as files, links, and subdirectories. The hierarchical arrangement was popularized in the UNIX operating system, and was also adopted in the Microsoft MS-DOS operating system for personal computers. The hierarchical arrangement survives today in various UNIX-based file systems, the Microsoft Windows operating system, and many other operating systems. Popular UNIX-based file systems are the UNIX file system (ufs), which is a version of Berkeley Fast File System (FFS) integrated with a vnode/vfs structure, and the System V file system (s5fs) . The implementation of the ufs and s5fs file systems is described in Chapter 9, pp. 261-289 of Uresh Vahalia, Unix Internals: The New Frontiers, 1996, Prentice Hall, Inc., Simon and Schuster, Upper Valley River, N.J. 07458. The implementation of the MS-DOS file system is described in Chapter 5, pp. 99-123 of Peter Norton and Richard Wilton, The New Peter Norton Programmer""s Guide to The IBM PC and PS/2, 1988, Microsoft Press, Redmond, Wash. 98073.
A file is normally accessed by a command that specifies a file name. At least the directory or subdirectory containing the file must be searched to find the directory entry containing the specified file name. This directory entry includes information pointing to the location where the file data are stored on disk. The search of the directory entries for the specified file name is a scanning process that suffers from a linear growth in access time as a function of the size of the directory.
The explosive growth of the Internet has led to a great demand for very large information repositories; for example, Internet service providers (ISPs) need to store e-mail for several hundreds of thousands of customers. Especially in the Internet domain, there is often a need to store information in large, flat directories. Invariably, high-speed access to these files is crucial, often dictating the survival of such businesses.
By way of example, consider a typical e-mail provider business. One way to organize the e-mail storage is by user name. The user names can be partitioned across several file systems akin to the way an encyclopedia is split into several alphabetically sorted volumes, In this case, the number of user directories in a file system is clearly not bounded and can change dynamically. Typically the number of user directories will grow, but it may fluctuate and will shrink if the e-mail provider becomes unsuccessful. Based on the distribution of names chosen by users, some file systems could very well have 100s of thousands of entries.
The conventional technique for accessing a directory in a traditional file system was not designed for such large directories. The conventional technique results in very long response times when dealing with large directories. Further, the response time tends to be a function of the size of the directory itselfxe2x80x94typically, the response time grows in linear proportion to the size of the directory. This is clearly unacceptable in the example of an e-mail service provider. Instead, it is desired to have approximately constant-time access to the contents of such directories
In accordance with a basic aspect of the invention, a digital computing system has random access memory and data storage. An operating system program is executed to manage at least one file system in the data storage. The file system includes at least one directory. The invention provides a computer-implemented method, which includes accessing the directory and compiling and storing in the random access memory hashing information for searching the directory. The method further includes, when a need arises for searching the directory, accessing the hashing information in the random access memory and using the hashing information for searching the directory.
In accordance with another aspect of the invention, a digital computing system has random access memory and data storage. An operating system program is executed to manage at least one file system in the data storage. The file system includes at least one directory of files in the data storage. An application program is also executed to access at least one file in the directory of files. The invention provides a computer-implemented method of accelerating a search of the directory of files for access of the application program to a specified file in the directory. Prior to a need for access to the directory to satisfy a file access request from the application program, the directory is accessed, and hashing information is compiled and stored in the random access memory for searching the directory to satisfy the file access request. When the need arises for access to the directory to satisfy the file access request from the application program, the hashing information is used for searching the directory to satisfy the file access request.
In accordance with another aspect, a digital computing system has random access memory and data storage. An operating system program is executed to manage at least one file system in the data storage. The data storage contains a plurality of directories, including at least one directory in the file system. The directories include a group of directories for which an accelerated search is desirable, and a group of directories for which an accelerated search is not desirable. The invention provides a computer-implemented method of accelerating a search of a specified directory in the file system. The data storage is accessed to find the group of directories for which an accelerated directory search is desirable, and accesses each directory in the group of directories for which an accelerated search is desirable and compiles and stores in the random access memory respective hashing information for searching each directory in the group of directories for which an accelerated search is desirable. When a need arises for searching a specified directory, the method further includes determining whether the specified directory has in the random access memory respective hashing information for searching the specified directory, and when it is determined that the specified directory has in the random access memory respective hashing information for searching the specified directory, using the respective hashing information for searching the specified directory.
In accordance with yet another aspect, the invention provides a program storage device containing a program for execution by a digital computing system having random access memory and data storage, and in which an operating system program is executed to manage at least one file system in the data storage. The file system includes at least one directory of files. The program is executable for compiling and storing in the random access memory hashing information for searching the directory. When a need arises for searching the directory, the program is executable for using the hashing information in the random access memory for searching the directory.