The present invention relates to computer system components that perform file system operations.
Modern computers are made up of several different components. Some of these components are physical devicesxe2x80x94hardware like the CPU (a central processing unit, such as a microprocessor), main memory (high-speed random access memory), disk drives, keyboard, and so onxe2x80x94and some are software like the applications or programs that the computer executes. One such software component is the operating system, which manages the interaction between the applications that it executes and the physical devices that together make up the computer. Included within virtually every operating system is the concept of a file system. This is the combination of the structure of the data stored on the physical disk drive and the file system driverxe2x80x94a software component that coordinates access to the data.
File systems are generally structured as an inverted tree structure where each node is given a sequence of characters describing it, as well as access restrictions, date and time of creation or access and many other features and information that are specific to the operating system. Each non-terminal node is generally called a directory or folder. Each terminal node is generally called a file. Locating a file to be opened requires parsing a path, which is a string composed of a hierarchical name for the file with each named component separated by some delimiter. For example, on a computer running a Microsoft(copyright) Windows operating system (95 or NT), the path xe2x80x9c  Work Project Month Documentxe2x80x9d indicates the hard disk drive partition (volume) named Work, the directory Project within the root directory of that volume, the directory Month within the Project directory, and the file Document within the directory Month.
The contents of a file may be called file data to distinguish it from meta data. Meta data is xe2x80x9cdata about dataxe2x80x9d. Meta data is the file system overhead that is used to keep track of everything about all of the files on a volume. For example, meta data tells what allocation units make up the file data for a given file, what allocation units are free, what allocation units contain bad sectors, and so on.
The data that is managed by the file system is generally stored on a mechanical magnetic storage device called a disk drive. For an application program to access a particular file on the disk, a directory lookup must usually be performed. A directory lookup can require: (i) accessing the sectors for each of the directories that are components of the file""s path, (ii) retaining the information necessary to access each directory""s physical data from disk, and (iii) computing the number of the sector where the file is located on the disk. It is at this point that a request for the operating system to open a file is satisfied. The application can then use operating system functions to read data from the opened file and finally to close the file, releasing any operating system resources being maintained for the file. Because the act of opening a file and reading its contents by performing this directory lookup requires several steps, file systems typically use a variety of techniques to minimize the adverse performance effects of repeating these steps over and over again. The caching of frequently used disk data in memory is one popular technique for minimizing adverse performance effects. Another technique is the indexing of directory data.
File systems commonly use an internal identifier to refer to a directory or file. For some file systems these internal identifiers are long-lived (i.e., persistent) and validly refer to the same file for the life of the file. When accessing a file, the directory lookup determines the internal identifier of each directory in the path, reads each directory""s data from the disk, and ultimately locates the internal identifier for the file. The data associated with the file identifier is then read. This data generally includes such attributes as access permissions, file size, file name, and where on the disk the file data is located. Finally, using the cluster information, the file dataxe2x80x94the data that a user regards as the contents of the filexe2x80x94is read from the disk.
The overhead associated with directory lookup is both necessary and useful in the general case. However, when an application provides its own mechanism for referring to files, using both the application""s lookup mechanism and that of the file system results in duplication of effort. The applications which perform their own mapping to files, and consequently cause this redundancy, are many and diverse. Applications that experience this performance degradation that can benefit from the present invention.
The invention provides methods and apparatus that enhance the performance of computer file systems, and in particular the performance of read-only operations in such file systems. In the principal embodiment that will be described, the methods and apparatus of the invention are implemented in a suite of computer program modules that together make up a performance enhancement product.
In general, in one aspect, the invention provides a product that transparently exists in an operating system after an initial setup is completed. The initial setup involves identifying what directories or files are to be monitored in order to intercept access requests for those files and to respond to those requests with enhanced performance.
Advantageous embodiments of the invention include one or more of the following features. A system administrator can specify what directories or files are to be monitored by the product. The product automatically creates and maintains a high-performance index of monitored directories or files. It transparently and automatically begins enhancing requests for monitored files whenever the application suite starts running.
Most operations are simply forwarded to the underlying file system driver. However, when a file is opened that is monitored, the open is performed using the file identifier bypassing the access of any directory meta data information.
A further enhancement of the invention is the elimination of access time updates for monitored files thereby eliminating write updates to directory contents, file system meta data and the operating system log file.
In the Windows NT file system NTFS, access to monitored files is enhanced by xe2x80x9cpinningxe2x80x9d files in the data cache maintained by the NTFS cache manager. Pinning forces the NTFS cache manager to retain file data in memory by leaving an outstanding operation in place (CcMdlRead) until such time as the invention calls the complete operation (CcMdlReadComplete). Maintaining access to the file data in memory for as long as possible increases the likelihood of the file being in memory when the next access request for the file is received.
Replacement of pinned files when available memory is exceeded is performed using a least-recently-used (LRU) selection process. As memory usage for the cache increases, adverse impacts of aggressive memory utilization are mitigated through monitoring of memory usage for other applications and adjusting memory as required.
Additionally, for the NTFS implementation, data operations are processed more efficiently in a number of ways explained in more detail later in the document. It can be configured and its runtime behavior can be controlled through the use of configuration parameters stored in either operating system-provided locations (ie. Windows Registry) or in configuration files read at startup. It provides a mechanism that allows a system administrator to cause all file operation requests to be directed to the standard file system driver without enhancement.
Among the advantages of the invention are the following. It improves the performance of applications that rely on high volumes of file accesses without resorting to a custom implementation of the file system. It improves the performance of applications that perform large numbers of file opens. An example of such a file-open intensive application is a Web server. It reduces the amount of time and resources required to locate and access files stored in a file system. In particular, it reduces the amount of system time and resources spent to obtain access to a file in the file system. It can transparently and automatically allow the standard file system to service access requests for monitored files whenever its operation is stopped. It does not affect requests for access to non-monitored directories and files. It allows a system administrator to direct all file system requests to the standard file system manually. A system administrator can configure and control its runtime behavior. Because it does not replace the native file system, it provides a portable, seamless, and relatively inconspicuous solution. Generally, the invention provides a performance improvement over standard operating system file opens as the number of files increase. Access time is reduced with increasing gains as the number of files being accessed increases.
Computer systems that are used as networked file servers can benefit from the invention. When a request to open a file is received, the path to the file is typically provided. If a computer is acting as a file server connected to a network, the path that is received is most likely a remote representation of the actual file path on the file server. In this case, the file server""s network redirector or daemon will perform the file open using a translated version of the supplied path. For some operating systems, keeping the file open indefinitely poses a severe resource drain. To avoid this, older files are generally closed and only reopened as needed. When a large number of files are being served, this reopen and reprocessing of the directory data structures can add significant overhead. The invention reduces this overhead.
World Wide Web (Web) servers can also benefit from the invention. Web servers open files requested by a client on the network, read the file data and transmit the information to the client. By not accessing or altering directory meta data, the invention reduces the work done for an individual request. Therefore each individual request can be handled more quickly (ie. request latency is reduced). The result is that the web server can process more requests in the same period of time.
Computers running document management systems can also benefit from the invention. Such systems typically manage objects by tracking several attributes, not just file names. These attributes are usually indexed and provide a high performance access path to some physical storage identifier for the managed files. In some implementations, the file data managed by the application is simply stored in the file system, which results in a redundant directory lookup to access the file data after the desired file name has been located in the document management system""s index. Other implementations avoid this problem by storing file data in a private data store, providing security and high performance access by avoiding the directory lookup and file open overhead associated with storing objects as separate files. Unfortunately, this may then preclude the use of many of the support utilities implemented by operating system and other software vendors. By running the document management application on a computer implementing the invention, the overhead of file system directory lookup is reduced without introducing any of the adverse affects of the private data store.
Another application which can benefit from the invention are mail servers since they can often access and open a large number of smaller files. Generally, any application which performs a large number of file open operations on smaller files will benefit from the invention since the overhead of processing file and directory meta data is a more significant portion of the overall file access operation.
Other advantages and features will become apparent from the following description and from the claims.