1. Field of the Invention
The present invention relates to computing systems and, more particularly, to storage and management of data in computing systems.
2. Description of the Related Art
Among other things, the operating system of a computer provides facilities for persistent storage and management of data. The facilities provided by the operating system typically insulate users (applications) from the implementation details used to store and manage data in a computer. For example, Unix operating systems provide abstract concepts such as files, directories, and file systems. A file can be viewed as a logical container for data. The term xe2x80x9cfilexe2x80x9d, as used herein, refers to an abstract encapsulation of data.
Although the Unix operating system considers a file as a sequence of bytes, through use of files and directories data can be organized logically and presented to the users in a logical file system. Accordingly, users of Unix operating systems can organize, manipulate, and access different files by interacting with the file system through an interface provided by the operating system. From a user""s perspective, Unix files are organized in a hierarchical tree structure. However, it should be noted that Unix operating systems typically store file entries sequentially within a persistent storage device (e.g., sectors of a disk).
FIG. 1A depicts a user""s perspective of a portion of a Unix file system. A user (application) can access a file by referencing its full pathname in the logical file system. For example, a file can be referenced by its full pathname xe2x80x9c/etc/passwdxe2x80x9d where xe2x80x9c/xe2x80x9d denotes the root of the logical file system, xe2x80x9cetcxe2x80x9d is a parent file (or directory) of the file xe2x80x9cpasswdxe2x80x9d. It should be noted that in Unix a file may serve as a logical directory containing one or more files. For example, FIG. 1A illustrates various directories /, bin, etc, dev, usr, lib and local, and various files TSP, bin, passwd and passwd. For example, file xe2x80x9cusrxe2x80x9d is a directory containing another file xe2x80x9clocalxe2x80x9d which is itself another directory, and so on. It should be noted that the term file and directory are used herein interchangeably. It should also be noted that in Unix operating systems, file names need only be unique within a directory. Accordingly, another file named xe2x80x9cpasswdxe2x80x9d can exist as xe2x80x9c/bin/passwdxe2x80x9d. Thus, typically, in a Unix operating system a file is identified by its full pathname.
In order to provide a user (application) with access to a file, the operating system typically first performs a linear search to traverse the full pathname to locate the file within a file system. This search typically requires several expensive and time consuming read operations on a persistent storage device (e.g., a disk) where data and the information as to how data segments relate to each other is stored. To minimize read operations on the persistent storage device, some operating systems have employed a directory name look-up cache (DNLC). As a central (global) resource, the directory name look-up cache provides information which can be used to locate most recently used files without having to perform read operations on the persistent storage device (e.g., disk). FIG. 1B depicts a computing environment 100 including a directory name look-up cache 102 suitable for storing filenames and references which provide access to files identified by the filenames. In order to provide access to a file, the operating system 104 first checks the directory name look-up cache 102 to determine whether the desired filename can be found. If the file name is not found in the directory name look-up cache 102, the operating system can initiate a search of a disk 106 to locate the file. Once the desired file is found, information about how to access the file can be cached into the directory name look-up cache 102 for future use.
One problem with conventional usage of directory name look-up caches is that information obtained during search operations to disk are not utilized. Accordingly, relatively expensive and time consuming read operations in disks are often repeated. The conventional usage of directory name look-up caches are especially inefficient for file systems which store directory entries sequentially within disk sectors (e.g., Fast File System (FFS), Unix file system (UFS)). In such file systems, since directory entries are stored sequentially, several relatively expensive and time consuming read operations to disk have to be performed whenever a filename cannot be found in the conventional directory name look-up cache. In view of the foregoing, there is a need for improved methods for providing efficient access to data stored in computing systems.
Broadly speaking, the invention relates to techniques for providing users and application programs with efficient access to data stored in computer systems. The invention is particularly well suited for use in computer systems where data can be logically organized in a file system. In one aspect of the present invention, a multilevel caching system suitable for storing information relating to files in the file system is provided. The stored information can include file references suitable for locating files in the file system as well as other useful information about the file system. The multilevel caching system provides the ability to implement various caching strategies at different levels and increases the probability of cache hits when seeking to locate files in a file system. Accordingly, relatively expensive read operations to persistent storage devices can be minimized when locating files in the file system.
The invention can be implemented in numerous ways, including a system, an apparatus, a method, or a computer readable medium. Several embodiments of the invention are discussed below.
As a method for locating data in a computer, the data being logically organized as one or more files in a file system, one embodiment of the invention includes the acts of: determining whether information associated with a file can be found in a primary cache; determining whether the information can be found in a secondary cache when the information cannot be found in the primary cache; and searching a storage device to locate at least a portion of data represented by the file on the storage device when the information associated with the file cannot be found in the secondary cache.
As a multilevel caching system for locating data in a computer, the data being logically organized as files in a file system, one embodiment of the invention includes: a primary cache operating to provide storage for storing information relating to one or more files in the file system; a secondary cache operating to provide storage for storing information relating to another one or more files in the file system which is not provided in the primary cache; and a file locator manager operating to search the primary and secondary caches for information relating to a file in the file system.
As a computer readable media including computer program code for locating data in a computer, the data being logically organized as one or more files in a file system, one embodiment of the invention includes: computer program code for determining whether a filename associated with a file can be found in a primary cache; computer program code for determining whether the filename can be found in a secondary cache when it is determined that the filename cannot be found in the primary cache; computer program code for initiating a search of a storage device to locate at least a portion of data represented by the file on the storage device when it is determined that the filename associated with the file cannot be found in the secondary cache; and computer program code for storing information in the secondary cache when it is determined that the filename cannot be found in the secondary cache.
The advantages of the invention are numerous. Different embodiments or implementations may have one or more of the following advantages. One advantage of the invention is that access to data can be achieved more efficiently. Another advantage of the invention is that useful information can be gathered at a nominal processing cost. Still another advantage is that the invention allows various caching strategies to be employed at different caching levels.