1. Field of the Invention
The present invention relates, in general, to data processing systems and more particularly to adaptive data caching in data processing systems to reduce transfer latency or increase transfer bandwidth of data movement within these systems.
2. Description of the Related Art
In modern data processing systems, the continual increase in processor speeds has outpaced the rate of increase of data transfer rates from peripheral persistent data storage devices and sub-systems. In systems such as enterprise scale server systems in which substantial volumes of volatile, or persistent data are manipulated, the speed at which data can be transferred may be the limiting factor in system efficiency. Commercial client/server database environments are emblematic of such systems. These environments are usually constructed to accommodate a large number of users performing a large number of sophisticated database queries and operations to a large distributed database. These compute, memory and I/O intensive environments put great demands on database servers. If a database client or server is not properly balanced, then the number of database transactions per second that it can process can drop dramatically. A system is considered balanced for a particular application when the CPU(s) tends to saturate about the same time as the I/O subsystem.
Continual improvements in processor technology have been able to keep pace with ever-increasing performance demands, but the physical limitations imposed on retrieving data from disk has caused I/O transfer rates to become an inevitable bottleneck. Bypassing these physical limitations has been an obstacle to overcome in the quest for better overall system performance.
In the computer industry, this bottleneck, known as a latency gap because of the speed differential, has been addressed in several ways. Caching the data in memory is known to be an effective way to diminish the time taken to access the data from a rotating disk. Unfortunately, memory resources are in high demand on many systems, and traditional cache designs have not made the best use of memory devoted to them. For instance, many conventional caches simply cache data existing ahead of the last host request. Implementations such as these, known as Read Ahead caching, can work in unique situations, but for non-sequential read requests, data is fruitlessly brought into the cache memory. This blunt approach to caching however has become quite common due to simplicity of the design. In fact, this approach has been put in use as read buffers within the persistent data storage systems such as disks and disk controllers.
Encoding or compressing cached data in operating system, system caches increase the logical effective cache size and cache hit rate, and thus improves system response time. On the other hand, compressed data requires variable-length record management, free space search and garbage collection. This overhead may negate performance improvements achieved by increasing effective cache size. Thus, there is a need for a new operating system, system file, data and buffer cache data managing method with low overhead, transparent to the operating systems in conventional data managing methods. With such an improved method, it is expected that the effective, logically accessible, memory available for file and data buffer cache size will increase by 30% to 400%, effectively improving system-cost performance.
Ideally, a client should not notice any substantial degradation in response time for a given transaction even as the number of transactions requested per second by other clients to the database server increases. The availability of main memory plays a critical role in a database server's ability to scale for this application. In general, a database server will continue to scale up until the point that the application data no longer fits in main memory. Beyond this point, the buffer manager resorts to swapping pages between main memory and storage sub-systems. The amount of this paging increases exponentially as a function of the fraction of main memory available, causing application performance and response time to degrade exponentially as well. At this point, the application is said to be I/O bound.
When a user performs a sophisticated data query, thousands of pages may be needed from the database, which is typically distributed across many storage devices, and possibly distributed across many systems. To minimize the overall response time of the query, access times must be as small as possible to any database pages that are referenced more than once. Access time is also negatively impacted by the enormous amount of temporary data that is generated by the database server, which normally cannot fit into main memory, such as the temporary files generated for sorting. If the buffer cache is not large enough, then many of those pages will have to be repeatedly fetched to and from the storage sub-system.
Independent studies have shown that when 70% to 90% of the working data fits in main memory, most applications will run several times slower. When only 50% fits, most run 5 to 20 times slower. Typical relational database operations run 4 to 8 times slower when only 66% of the working data fits in main memory. The need to reduce or eliminate application page faults, data or file system I/O is compelling. Unfortunately for system designers, the demand for more main memory by database applications will continue to far exceed the rate of advances in memory density. Coupled with this demand from the application area comes competing demands from the operating system, as well as associated I/O controllers and peripheral devices. Cost-effective methods are needed to increase the, apparent, effective size of system memory.
It is difficult for I/O bound applications to take advantage of recent advances in CPU, processor cache, Front Side Bus (FSB) speeds, >100 Mbit network controllers, and system memory performance improvements (e.g., DDR2) since they are constrained by the high latency and low bandwidth of volatile or persistent data storage subsystems. The most common way to reduce data transfer latency is to add memory. Adding memory to database servers may be expensive since these applications demand a lot of memory, or may even be impossible, due to physical system constraints such as slot limitations. Alternatively, adding more disks and disk caches with associated controllers, or Network Attached Storage (NAS) and network controllers or even Storage Aware Network (SAN) devices with Host Bus Adapters (HBA's) can increase storage sub-system request and data bandwidth. It may be even necessary to move to a larger server with multiple, higher performance I/O buses. Memory and disks are added until the database server becomes balanced.
First, the memory data encoding/compression increases the effective size of system wide file and/or buffer cache by encoding and storing a large block of data into a smaller space. The effective available reach of these caches is typically doubled, where reach is defined as the total immediately accessible data requested by the system, without recourse to out-of-core (not in main memory) storage. This allows client/server applications, which typically work on data sets much larger than main memory, to execute more efficiently due to the decreased number of volatile, or persistent, storage data requests. The numbers of data requests to the storage sub-systems are reduced because pages or disk blocks that have been accessed before are statistically more likely to still be in main memory when accessed again due to the increased capacity of cache memory. A secondary effect of such compression or encoding is reduced latency in data movement due to the reduced size of the data. Basically, the average compression ratio tradeoff against the original data block size as well as the internal cache hash bucket size must be balanced in order to reap the greatest benefit from this tradeoff. The Applicant of the present invention believes that an original uncompressed block size of 4096 bytes with an average compression ratio of 2:1 stored internally in the cache, in a data structure known as an open hash, in blocks of 256 bytes results in the greatest benefit towards reducing data transfer latency for data movement across the north and south bridge devices as well as to and from the processors across the Front-Side-Bus. The cache must be able to modify these values in order to reap the greatest benefits from this second order effect.
There is a need to improve caching techniques, so as to realize greater hit rates within the available memory of modern systems. Current hit rates, from methods such as LRU (Least Recently Used), LFU (Least Frequently Used), GCLOCK and others, have increased very slowly in the past decade and many of these techniques do not scale well with the availability of the large amounts of memory that modern computer systems have available today. To help meet this need, the present invention utilizes a entropy signature from the compressed data blocks to supply a bias to pre-fetching operations. This signature is produced from the entropy estimation function described herein, and stored in the tag structure of the cache. This signature provides a unique way to group previously seen data; this grouping is then used to bias or alter the pre-fetching gaps produced by the prefetching function described below. Empirical evidence shows that this entropy signature improve pre-fetching operations over large data sets (greater than 4 GBytes of addressable space) by approximately 11% over current techniques that do not have this feature available.
There is also a need for user applications to be able to access the capabilities for reducing transfer latency or increasing transfer bandwidth of data movement within these systems. There is a further need to supply these capabilities to these applications in a transparent way, allowing an end-user application to access these capabilities without requiring any recoding or alteration of the application. The Applicant of the present invention believes this may be accomplished through an in-core file-tracking database maintained by the invention. Such a core file-tracking data base would offer seamless access to the capabilities of the invention by monitoring file open and close requests from the user-application/operating system interface, decoding the file access flags, while maintaining an internal list of the original file object name and flags, and offering the capabilities of the invention to appropriate file access. The in-core file-tracking database would also allow the end-user to over-ride an application's caching request and either allow or deny write-through or write-back or non-conservative or no-caching to an application on a file by file basis, through the use of manual file tracking or, on a system wide basis, through the use of dynamic file tracking. This capability could also be offered in a more global, system-wide way by allowing caching of file system metadata; this caching technique (the caching of file system metadata specifically) is referred to throughout this document as “non-conservative caching.”
There is a further need to allow an end-user application to seamlessly access PAE (Physical Address Extension) memory for use in file caching/data buffering, without the need to re-code or modify the application in any way. The PAE memory addressing mode is limited to the Intel, Inc. x86 architecture. There is a need for replacement of the underlying memory allocator to allow a PAE memory addressing mode to function on other processor architectures. This would allow end-user applications to utilize the modern memory addressing capabilities without the need to re-code or modify the end-user application in any way. This allows transparent seamless access to PAE memory, for use by the buffer and data cache, without user intervention or system modification.
Today, large numbers of storage sub-systems are added to a server system to satisfy the high I/O request rates generated by client/server applications. As a result, it is common that only a fraction of the storage space on each storage device is utilized. By effectively reducing the I/O request rate, fewer storage sub-system caches and disk spindles are needed to queue the requests, and fewer disk drives are needed to serve these requests. The reason that the storage sub-system space is not efficiently utilized is that, on today's hard-disk, storage systems, access latency increases as the data written to the storage sub-system moves further inward from the edge of the magnetic platter, in order to keep access latency at a minimum system designers over-design storage sub-systems to take advantage of this phenomenon. This results in under-utilization of available storage. There is a need to reduce average latency to the point that this trade-off is not needed, resulting in storage space associated with each disk that can be more fully utilized at an equivalent or reduced latency penalty.
In addition, by reducing the size of data to be transferred between local and remote persistent storage and system memory, the I/O and Front Side Buses (FSB) are utilized less. This reduced bandwidth requirement can be used to scale system performance beyond its original capabilities, or allow the I/O subsystem to be cost reduced due to reduced component requirements based on the increased effective bandwidth available.
Thus, there is a need in the art for mechanisms to balance the increases in clock cycles of the CPU and data movement latency gap without the need for adding additional volatile or persistent storage and memory sub-systems or increasing the clock cycle frequency of internal system and I/O buses. Furthermore, there is a need to supply this capability transparently to end user applications so that they can take advantage of this capability in both a dynamic and a directed way.