1. Field of the Invention
This invention relates generally to methods and apparatus for improving the performance of a computer and more particularly to providing a hybrid data caching mechanism for enhancing system performance.
2. Description of the Related Art
Users of the various versions of the WINDOWS(trademark) operating systems, as well as any other available operating systems, often wait long periods of time for programs to load, system boot time and data to be read/written due to large amounts of disk input/output (I/O) used to service their request. Especially when running more than one software program simultaneously, disk I/O can take significant amounts of time while searching for and loading programs and data. Much of this wasted time or overhead is caused by disk seeks. Another reason for this wait time is the overhead in issuing commands to read/write the many small pieces of data.
Disk seeks occur for one of several reasons, but mainly because the data that makes up a file or program is placed in one area of the disk, while the directory information, the data that keeps track of where the files and programs are kept, is often kept in a totally different area. Thus, accessing a file requires accessing the data found in the directory and then accessing the file or program itself, causing the disk to seek to different locations. Multiple seeks are typically required to locate a file in the directory, especially if the file is not in the first part of the root directory. Some file-systems have a third area that must be referred to also, the file allocation table (FAT), which requires a third seek to access the file or program. Running more than one software program simultaneously can compound this problem due to the two programs keeping their data in different areas on the disk.
Small reads/writes occur because the file system driver that manages the files/data on the disk doesn""t know what any given program will next request. Hence, the file system driver reads just the amount of data requested. When more than one program is requesting data, the file system driver can end up reading a small amount of data for a first program, then seek to a different area on the disk to read another small amount for a second program, then seek back to the original area to read the next small amount of data for the first program, and so forth. Also, because of use of a FAT file-system, the operating system (OS) may need to do additional small reads of the FAT in between the reads of the data to find where the data resides. In addition, the cache becomes fragmented from the small reads resulting in the lack of large areas in memory. This lack of large areas in memory makes it impossible to store large reads without discarding some of the smaller reads which may require further seeks further exacerbating the problem.
Another major reason contributing to both seeks and small reads/writes to the disk is the use of virtual memory. When more programs/data are loaded than can fit into the computers memory, the OS moves some of the programs/data onto an area of the hard disk known as the swap file or paging file. The amount moved is in xe2x80x9cpagexe2x80x9d sized increments (with each page being 4 KB). Since there is no way that the OS knows ahead of time which piece of which program/data will be needed, it moves them on an as needed basis, which tends to cause many small reads/writes to different areas of the swap file.
To circumvent these problems, the file system drivers in the various versions of operating systems, such as WINDOWS(trademark) and its variants, do a certain amount of caching. The operating systems tend to keep in memory (cache) the data that makes up the directory information. However, often the initial requests that cause the file system drivers to read and cache the directory information cause it to be read in small pieces interspersed with seeks to the files that the directory information represents. After this, accessing that same directory information is extremely fast for awhile. Eventually the file system drivers need the memory to cache some other directory information or program data, thus making way for a repeat of the initial situation when that directory is needed again. Furthermore, the caching programs strictly use the MRU-LRU (most recently used-least recently used) mechanism as their sole means of deciding what data to keep and what data to discard. Accordingly, an important file that has not been used recently will be discarded when the cache reaches a target capacity and is forced to free up additional capacity. Strictly adhering to the MRU-LRU mechanism fails to ensure that files which have been used often, but may not have been used recently, are maintained in the cache.
As a result, there is a need to solve the problems of the prior art to provide for an improved caching mechanism to minimize overhead and seeks, thereby improving system performance.
Broadly speaking, the present invention fills these needs by providing a method and apparatus that minimizes seeks and reads to a hard drive and which keeps data in the cache based upon currency of use and the number of times the data is used. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a device. Several inventive embodiments of the present invention are described below.
In one embodiment, a caching method for enhancing system performance of a computer is provided. The method initiates with reading files in response to a request from an operating system.. Then, copies of the read files are stored in a cache where the cache is located within a random access memory of the computer. Next, frequency factors are assigned to each of the files stored in the cache, where the frequency factors indicate how often each of the corresponding files is accessed by the operating system. Then, the frequency factors are scanned in response to a target capacity of the cache being attained. Next, a least frequently and least recently used file is identified. Then, the least frequently and least recently used file is eliminated to liberate capacity of the cache.
In another embodiment, a method for caching files in a cache of a random access memory (RAM) of a computer is provided. The method initiates with the files being requested. Here the files are stored on a storage medium. Next, the files are read and then the files are copied to the cache. Then, a frequency factor is assigned to each of the copied files, where the frequency factor reflects how often each of the copied files is accessed. Next, the frequency factor for each of the files is scanned in response to a target capacity of the cache becoming full. The scanning is configured to find a least frequently and least recently used file that has been least recently used. Then, the least frequently and least recently used file is discarded to free-up additional capacity of the cache.
In yet another embodiment, a computer readable media having program instructions for a caching method which enhances the system performance of a computer is provided. The computer readable media includes program instructions for storing files in a cache where the files are associated with a request from an operating system. The cache is located in a random access memory of the computer. The computer readable media includes program instructions for assigning frequency factors to each of the files stored in the cache, where the frequency factors indicate how often each of the corresponding files are requested by the operating system. Program instructions for scanning the frequency factors are included where the scanning is performed in response to a target capacity of the cache being attained. Also included are program instructions for identifying a least frequently and least recently used file. Program instructions for eliminating the least frequently and least recently used file in order to liberate capacity in the cache are included.
In still another embodiment, an apparatus for caching files in a computer system, where the computer system includes a microprocessor, a random access memory and an operating system is provided. The apparatus includes a disk drive for storing data, where the data includes a directory structure, file allocation table (FAT) and file data. The apparatus includes a cache located within a random access memory (RAM). The RAM is configured to cache files requested by the operating system according to a caching mechanism. The caching mechanism assigns frequency factors to each of the files in the cache. Upon reaching a target capacity of the RAM, the frequency factors for each of the files are scanned to identify a least frequently and least recently used file that is discarded to free cache capacity.
In another embodiment, a method for caching data in a computer is provided. The method initiates with files being stored in a cache where the files are associated with read operations and the cache is located in a RAM of the computer. Next, a frequency factor is associated with each of the files in the cache. The frequency factor of each of the files reflects how frequent each of the files is accessed. Then a least frequently and least recently used file is identified. The least frequently and least recently used file has a frequency factor lower than the frequency factor for other files in the cache. Next, the least frequently and least recently used file is eliminated from the cache when a target capacity of the cache is reached.
The advantages of the present invention are numerous. Most notably, system performance is enhanced through the hybrid caching mechanism where a file that has been used often but not recently is maintained in the cache. In addition, by performing large reads, the data will be present in the cache in order to minimize seeks and reads from the hard drive.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.