A parallel, shared disk file environment includes a set of computer nodes, disk storage devices, a communications network, and a parallel file system running on each computer node. A parallel file system differs from a traditional distributed file system, like the Network File System (NFS) or the Distributed File System (DFS), in that with a parallel file system, data belonging to the same file is distributed or “striped” across disks that are attached to different nodes in the environment or directly attached to the network. A parallel file system allows data to be transferred between disks and computer nodes without requiring all the data to pass through a single server node.
The meta data of files, which includes the file attributes, such as file size, last-modified time, and file owner, are also striped across the disks in a parallel file system. That is, the various data structures that include the meta data (referred to as inodes) are stored on different disks.
Applications executing in a computing environment, regardless of whether the environment employs a traditional or parallel file system, often request a directory listing of the files of a directory including the file attributes. In order to provide this listing, the file system reads all of the inodes of the files of the requested directory. However, for a large directory, reading inodes one at a time can be very time consuming.
In traditional file systems, the problem of reading inodes efficiently has been addressed by clustering inodes. That is, by arranging for inodes of files of the same directory to be close together on disk (e.g., grouped together in inode blocks). Thus, instead of reading individual inodes, a whole block of inodes is read in a single I/O. Since inodes are typically small, the cost of reading a block of inodes is not much higher than reading a single inode, and reading a whole block of inodes is significantly faster than reading each inode individually.
However, this solution is not well-suited for a parallel file system for at least the following reasons:                1. Applications running on a parallel file system may concurrently access different inodes of the same directory (for example, a parallel mail server). If all of these inodes are clustered within the same inode block, then all I/Os to read or write these inodes will go to the same disk, causing access to these inodes to become a bottleneck.        2. A parallel file system requires distributed locking to synchronize access to file data and meta data from multiple nodes in the network. To read a whole block of inodes would require getting a lock on each of the inodes in the block, requiring messages to a lock coordinator. Therefore, in a parallel file system, the cost of reading a whole block of inodes is significantly higher than the cost of reading a single inode. Hence, an approach that always caches only whole inode blocks would speed up inode access only if the locking granularity were increased, so that each lock pertains to a whole block of inodes instead of an individual inode. However, this would significantly increase the number of lock conflicts due to “false sharing” between nodes: If two nodes were concurrently updating different inodes within the same inode block, then each inode update would require messages and possibly I/O to revoke the lock on the inode block from the other node.        
Thus, a need still exists for an efficient technique for reading inodes of a parallel file system. In particular, a need exists for a facility that manages when and how to prefetch inodes.