Field of the Invention
The present invention relates generally to data storage and more specifically but not exclusively to a database management system including file pre-allocation of data storage devices.
Description of Related Art
The slow mechanical nature of input/output (I/O) devices such as disks compared to the speed of electronic processing has made I/O devices a major bottleneck in computer systems. As the improvement in processor performance continues to far exceed the improvement in disk access time, the I/O bottleneck is becoming more significant.
Mass storage devices are commonly viewed as providing a series of addressable locations, also described as blocks, in which data can be stored.
Different programs require different amounts of memory. If the program requires more, for instance, 2 MB but the operative system cannot find a continuous block of 2 MB, the operative system will allocate more than one block until it reaches 2 MB, which leads to a fragmentation problem.
Fragmentation is a problem that affects both volatile memory as well as the mass storage devices. When a host program asks the operative system to store a file, if the file size is larger than that of the available continuous memory, different disk sections will be used to store the blocks and fragmentation will occur. Thus, fragmentation can be defined when data blocks that compose the data object are not all physically adjacent to each other. Fragmented data causes the overall system to slow down since it requires more computations to access data in its entirety.
Data storage operation speed is also related with the hierarchical directory structure that file systems use to organize the files on the storage devices. In fact, the logic and procedures used to keep the file system within the storage provided by an underlying mass storage device have a profound effect in speed and efficiency.
File system is defined herein as a set of interrelated data structures that are used to divide the storage available between a plurality of data files.
Currently the host operating system is responsible for allocating working memory for guest applications. These guest applications make requests to the underlying operating system for memory and then return it when not in use.
The host operating system is also responsible for making sure there is available memory for all running programs. When some guest applications require more random access memory (RAM) than currently available the operating system might choose to use the virtual memory to mitigate and still allow the system to run.
When a guest application requests a file from disk, this request is forwarded to the operating system which then loads the entire file into memory and then makes the list of memory addresses available to the guest application.
To increase effective disk performance, processes have been described where reorganization of data blocks on a disk are based on the anticipation as to which data blocks are likely to be accessed together by the users. United States patent application publication number 2004/0088504 A1 discloses an autonomic storage system that continually analyses I/O request patterns to optimize the layout of data on a storage device, such as a disk drive, to increase the system performance.
Similar reallocation approaches have been widely discussed in the literature. Particularly, Intel Corporation have employed a great deal of effort in the optimization/performance of mass storage devices mainly based in data block relocation methodologies and applications, as disclosed in U.S. Pat. Nos. 6,105,117; 6,137,875; 6,434,663; and 6,742,080 and in cache optimization techniques, as disclosed in U.S. Pat. Nos. 6,941,423; 7,275,135; and 7,308,531.
Prior analysis of the operation of the disk array storage device before loading a particular data set and then determining an appropriate location for that data are also a trend in addressing the performance optimization problem of mass storage devices. Load balancing techniques aim to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource.
U.S. Pat. No. 4,633,387, entitled “Load Balancing in a Multiunit System”, discloses load balancing in a multi-unit data processing system in which a guest application operates with multiple disk storage units through plural storage directors. Also, U.S. Pat. No. 5,239,649, entitled “Channel Path Load Balancing, Through Selection of Storage Volume to be Processed, For Long Running Applications”, discloses a system for balancing the load on channel paths during long running applications.
It is commonly accepted that data allocation has a profound impact on the quality of the operational efficiency of the computer system. However, most of these data allocation approaches have been proposed prior to the design of a database. The problem of these approaches is that they are sub-optimal solutions, since the demand for computer systems availability; autonomy and dynamic networking make the access probabilities of nodes to data blocks to change over time, therefore degrading the database performance.
Accordingly, there is a need in the art for a more effective data allocation process in (I/O) storage devices, as well as for an improved database logic and design, which render disk access times much faster than in prior art systems.