Users are demanding increased performance of their applications running on their computers. Computer hardware, including central processing units (CPUs), are becoming increasingly faster. However, their performance is limited by the speed at which data is available to be processed. In a typical computer, Level 1 (L1) and Level 2 (L2) cache memories are physically close to a processor to provide data at very high rate. The cache memory is typically divided into 32 byte cache lines, a cache line being the common unit of data retrieved from memory. When the required data is not available in L1 cache, a cache line fault occurs and the data must be loaded from lower speed L2 cache memory, or relatively slow RAM. The application is often effectively stalled during the loading of this data, and until such time as the data is available to the CPU. Therefore, by decreasing the number of cache faults, an application will run faster. Furthermore, data elements within an application are not randomly accessed. Rather, data elements, especially within the same structure, union, or class are typically accessed within a short period of other data elements within the same structure, union, or class.
The first step in optimizing an application is to model the usage patterns of data elements by the application. To accomplish this, the application being optimized is executed and used in a typical manner, with data being recorded that tracks the order in which the data elements are accessed. In doing so, a stream of data at a rate of 30-40 megabytes per second on typical hardware is generated. Traditional disk writing methods cannot keep up with this volume of data. Hence, if all of this data is to be collected to disk, either the disk logging process must also be optimized, or the execution of the application must be slowed down or modified which degrades the accuracy of the data usage model. Therefore, the preferred approach is to optimize the data logging such that the application being modeled is not hindered by the data logging method used.
Traditional data logging methods operate in a linear fashion by generating a first record of data, writing the first record of data to disk, generating a second record of data, writing the second record of data to disk, and so on. While this approach is simplistic, this method does not optimize the writing of a large amount of data to disk, such as the voluminous data stream generated when modeling the application. In fact, the processing overhead is so high that the linear data writing approach does not allow data to be written at the fastest rate allowed by the hardware. Rather, the data logging rate is limited by software processing of individual write operations. Such is the same problem with reading in records of data, one record at a time. Needed is a solution for writing and reading of a large amount of linear, order dependent data at high rates of speed which approach the physical limitations of the hardware device to which the data is being logged.