1. Field of the Invention
The present invention relates to computer storage methods and systems, and more particularly to methods and systems for robust dynamic storage allocation.
2. Description of the Related Art
Many computer systems need to allocate storage dynamically. Dynamic storage allocation is used by operating systems to allocate storage for executing programs. Other examples of dynamic storage allocation may include Web servers which store Web data. In many cases, sizes of memory being requested are unknown until the time of the request. The lifetime for a dynamically allocated block may also be unknown.
A considerable amount of work has been performed in developing efficient dynamic storage allocation algorithms for main memory. Considerably less work has been done in developing efficient dynamic storage allocation algorithms for disks.
Dynamic storage allocation on disk is important for a number of reasons. In many cases, it is essential to have data which persists over time, even after the system is shut down. Disk memory provides persistent storage. Disk memory also provides fault-tolerant storage; information stored on disk can often be preserved after a system crash in situations where the contents of main memory are lost. Another advantage of disk memory includes the possibility that more of disk memory can be made available at a more reasonable price than main memory. It can thus be used for storing information which cannot fit in main memory.
Referring to FIG. 1, a first fit system allocates (all or part of) the first memory block located which is large enough to satisfy a memory request. For a memory request of xe2x80x9c7xe2x80x9d, a first fit returns B1 since this is the first block encountered which can satisfy the request. A best fit system allocates (all or part of) a smallest block which is large enough to satisfy a request. In FIG. 1, block B3 would be returned since xe2x80x9c7xe2x80x9d fits best in B3 (which has a capacity of 8).
Referring to FIG. 2, in a binary buddy system, block sizes are in powers of 2 (e.g., 4 and 4, 8 and 8, etc.). Many dynamic storage allocators (DSA""s) maintain one or more lists of free blocks. Such lists are known as free lists, e.g., lists of free blocks. Separate free lists exist for blocks of different sizes. Buddy system allocating blocks of other sizes also exist as well. A good overview of prior art in dynamic storage allocation is described in a paper by Arun Iyengar titled xe2x80x9cScalability of Dynamic Storage Allocation Algorithmsxe2x80x9d published in Proceedings of IEEE Frontiers ""96, October 1996, as well as the bibliographic references in this paper.
Dynamic storage allocators (DSAs) can use different methods for coalescing adjacent free blocks. One approach is to use immediate coalescing, in which a deallocated block is combined with neighboring free blocks at the time the block is deallocated as shown in FIG. 3. In FIG. 3, the block sizes are indicated in each block. A positive size indicates a free block, while a negative size indicates an allocated block.
Referring to FIG. 4, another approach includes deferred coalescing. When deferred coalescing is used, adjacent free blocks are not automatically combined after a deallocation. Instead, at some point (such as when a large enough block to satisfy a request cannot be located), the DSA will scan through blocks in memory and combine adjacent ones as shown in FIG. 4.
Fragmentation is memory wasted by a DSA. Internal fragmentation is memory lost by satisfying a request with a block larger than the request size (e.g., satisfying a request for a block of size 25 with a block of size 32). External fragmentation occurs when free blocks become interspersed with allocated blocks. In these situations, an allocation request for b bytes may be unsatisfiable even if  greater than b bytes are free because the largest contiguous block of free storage is smaller than b bytes.
Multiple free list fit I (MFLF I) as described in xe2x80x9cScalability of Dynamic Storage Allocation Algorithmsxe2x80x9d cited above uses multiple free lists, organized by size. Free lists for small blocks are known as quick lists. Free lists for large blocks are known as misc lists. When a single misc list is maintained, MFLF I degenerates into a storage allocation system known as quick fit.
Referring to FIG. 5, a quick fit technique is shown. In this system, quick lists exist for blocks up to size 16; the number of quick lists can be varied to optimize performance for different request distributions. In this example, allocations for a block of size s where 2xe2x89xa6sxe2x89xa616 (2 is the minimum block size) is done by examining the quick list for size s. If this list is not empty, the first block on the list is used to satisfy the request. Note that it is possible to have quick lists for block sizes corresponding to multiples of grain sizes. For example, in FIG. 2, the grain size is 1. If the grain size is 1000, quick lists for blocks of size 1000, 2000, 3000, . . . , 16000, (a total of 16 quick lists) may be used. MFLF I uses deferred coalescing. Memory is divided into working storage 12 and a tail 14 as shown in FIG. 5. Working storage 12 includes allocated blocks and blocks on a free list. Tail 14 includes a contiguous block of unallocated storage at one end of the memory. Initially, before anything is allocated, tail 14 includes all allocatable memory, and free lists are empty. free lists include quick lists and misc lists, where misc lists are employed for larger memory blocks. Blocks 13 include a size (indicated by the numbers in blocks 13). When a request cannot be satisfied by examining one or more free lists, the request is satisfied by allocating from tail 14. A tail pointer (tail ptr.) indicates where tail 14 begins. Free lists are populated when allocated blocks are deallocated.
To satisfy a request for a block which is too large for a quick list, quick fit does a first fit search of the misc list. Searches for large blocks may require many instructions. To reduce this overhead, MFLF I can use multiple misc lists, as indicated in FIG. 6, instead of a single list as in quick fit. In FIG. 6, a misc list exists for blocks 13 of size 17-25, another one exists for blocks 13 of size 26-40, and yet another one exists for blocks of size 41-60. Various strategies can be used for satisfying a request, including the one shown in FIGS. 7A and 7B to allocate xe2x80x9c84xe2x80x9d using MFLF I. FIG. 7A shows a xe2x80x9cbeforexe2x80x9d snapshot of memory while FIG. 7B shows an xe2x80x9cafterxe2x80x9d snapshot when the request to allocate 84 is satisfied. In FIGS. 7A and 7B, the system allocates a first block on list L2 to satisfy the request by splitting a block of size xe2x80x9c90xe2x80x9d and returning the excess of size xe2x80x9c6xe2x80x9d to a free list. The system examines L2 instead of L1 because a smallest block allowed on L2 is of size 89. Therefore, it is not necessary to search beyond the first block in L2 to satisfy a request of size less than or equal to 89.
Although the techniques described above are sufficient for many applications, straightforward adaptations of main-memory dynamic storage allocation algorithms to disk systems often result in poor performance because the latency for accessing and writing to disks is much higher than for main memory.
Therefore, a need exists for dynamic storage methods for disk memory which reduces a number of accesses and a number of writes to a disk. A further need exists for memory allocation and deallocation methods which provide for more efficient storage and faster access times.
A method for managing computer memory, in accordance with the present invention, includes maintaining multiple sets of free blocks of memory wherein a free block is added to a set based on its size. In response to a request for a block of a request size, a set of blocks is searched for a free block which is at least as large as the request size but smaller than the request size plus a threshold. If such a block is found, the block is allocated in its entirety. In other methods, the step of maintaining multiple sets of free blocks may include the steps of maintaining the multiple sets of free blocks on a disk or in main memory. The steps of, if the free block is not found but at least one block at least as large as the request size is found, splitting the at least one block at least as large as the request size and satisfying the request with a second block resulting from the splitting step may be included. The at least one block being split may be a smallest found block larger than the request size. The multiple sets may include a set of largest blocks and the first threshold may be larger than a smallest permitted block size for the largest set of blocks, and the step of searching the largest set of blocks to satisfy the request may be included. The step of searching may include the step of searching the set which includes a list, the list including memory block allocation information for the set. The method may further include the step of dynamically adjusting at least one of the first and the second thresholds to tune performance.
In still other methods, the method may include the steps of maintaining a contiguous block of memory, selecting the set of free blocks based on the request size, in response to the set of free blocks being non-empty, allocating at least part of a first block located in the set and in response to the set being empty, satisfying the request from the contiguous block of memory. The method may include the steps of maintaining a pointer to the contiguous block for a first pointer location, in response to a request for storage of a given size, incrementing the pointer by the given size to point to a second pointer location and satisfying the request using a part of the contiguous block beginning at the first pointer location. The method may also include deallocating an allocated block to a new set of free blocks based on the size of the allocated block.
A method for managing a block of contiguous persistent memory includes the steps of maintaining a first pointer in main memory for determining a first free byte in a first block of memory, maintaining headers in the persistent storage for a plurality of blocks wherein the headers include at least one code byte and allocating a second block from the first block by setting the at least one code byte for the second block to indicate that the second block is no longer part of the first block and updating the first pointer in the main memory.
In other methods, the step of adding a block, b, to the first block may be included, the step including adding the steps of modifying a code byte corresponding to b to indicate that b is part of the first block, and updating the first pointer in the main memory. The at least one code byte may serve other functions in addition to indicating that the second block is no longer part of the first block. The method may include the step of periodically updating a second pointer in the persistent memory based on the value of the first pointer in the main memory. The method may further include the step of, in response to a system failure, updating the first and second pointers by examining the at least one code byte. The headers may be initialized using a single block write. of the first pointer in the main memory. The method may further include the step of, in response to a system failure, updating the first and second pointers by examining the at least one code byte. The headers may be initialized using a single block write.
A method for coalescing adjacent free blocks in persistent storage includes the steps of maintaining a header for each block indicating a size and an allocation status of the block, coalescing the free blocks by examining header information to combine adjacent free blocks and periodically storing information in the persistent storage indicating a level of coalescing achieved.
The method for coalescing may include the steps of, in response to a system failure during coalescing, reading information in the persistent storage to determine the level of coalescing achieved before the system failure and resuming coalescing at a point based on the information in the persistent storage.
The methods and method steps may be implemented with a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform the method steps.