Computer systems rely on the ability to access data. Certain data is stored within the computer itself in memory. Other data is stored externally on one or more disk storage systems. The computer's processor will access input data, perform a set of calculations, and provide output data. Because the computer typically cannot hold all the data it needs in its internal memory to complete the set of calculations, it creates temporary “scratch” data that is written to disk and recalled by the processor when it is again needed to complete another calculation. Large quantities of data may be swapped between the internal memory and the disk storage system multiple times, depending on the complexity of the calculations being performed. For complex calculations involving large input and output data sets, a tremendous amount of scratch data can be generated.
Disk storage systems such as a Redundant Array of Independent Disks (“RAID”) exemplify a data transfer optimization technique with advantages and disadvantages. RAID utilizes a technology that gangs multiple disk drives together, creating a “fat” data pipe that provides multiple channels for access to the ganged drives simultaneously (that is, in parallel). The drives are all working together so that if you have 10 drives ganged together, you can realize 10 times the bandwidth for accessing data (as compared to a single drive). RAID thus provides a large bandwidth data pipe which allows large volumes of data to be quickly transferred. One problem with RAID systems is that because multiple disk drives are being used for every transfer, relatively small chunks of data require more disk activity. Thus access times can be longer than with a single disk drive. For example, to update a single item of data that is smaller in size than the bandwidth of the minimum block size accessed by the 10-wide RAID system, the computer would need to read in the entire block across all 10 drives, update the single item of data, and then write the entire block back out to all 10 drives. RAID is a good technique for transferring large amounts of contiguous data but if an application requests many small, non-sequential updates to its files, the RAID system will actually result in slower input/output (I/O) access than if a single disk were used. RAID, therefore illustrates an I/O performance optimization that is either helpful or harmful, depending on what type of I/O is to be done.
Moving data between the computer's memory and the disk storage system is the work of the computers file system, which is an integral part of the computer's operating system kernel. File systems use technologies that will cache data and generate metadata associated with data files stored on disk storage systems. Such metadata for a file includes information about a file such as file creation time/date, file modification time/date, last access date/time, user permissions, file ownership, etc. Some systems are sensitive to lost data and need the file system to work even when component failures occur. Such systems will typically update a file's metadata every time an application accesses that file. For example, the file system would instruct the disk storage to update a file and its metadata, and wait for a confirmation of completion before proceeding with its next task. In contrast, some systems value speed and will allow the file system to keep meta data in memory and write metadata updates back out to the disk storage using “lazy” update methods. For example, for a big file on a UNIX system utilizing a 10-wide RAID, every time an application writes to a file, the file system is supposed to update the modification time for that file. The file system will read the sector containing the metadata into memory, patch in the metadata update, and write the sector back out to disk. The file system thus has two jobs associated with the file access because after it streams data to the 10-wide stripe it must then update the metadata by reading in the sector, patching the metadata, and writing it back out. While updating the metadata, all other disk access is suspended. By using lazy updates for metadata, the file system can wait and update all pending metadata for a batch of files at once (for example only save metadata on a periodic schedule).
Unfortunately for application developers, file systems available today are designed with a “one-size-fits-all” approach, and optimized for a broad range of conditions, minimizing instances of extremely poor performance, but seldom, if ever, achieving optimal performance. Further, there is a dearth of good tools for analyzing the I/O activity generated by applications, and for providing application designers with detailed information about how I/O access affects their application's performance.
For the reasons stated above and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the specification, there is a need in the art for systems and methods for managing I/O throughput for large scale computing systems.