As computer software developers develop software for 64-bit processors, data alignment in data storage, such as a disk, will become extremely important. xe2x80x9cAlignmentxe2x80x9d in the data storage context refers to limitations placed on the address of a data object in the data storage. For example, a common xe2x80x9calignment requirementxe2x80x9d is that the address of a data object must be a multiple of a power of 2, i.e. 2N, which means that the least significant N digits of the address must be zero.
Alignment is important for functional reasons because an unaligned data access may cause a bus error resulting in a system crash. Alignment is also important for performance reasons because unaligned data access, which can be handled with hardware or software alignment correction tools, will likely become more expensive as processor speeds continue to increase.
Data stored in a data storage is typically heterogeneous, in the sense that it consists of elements with varying alignment requirements. It is easy to optimize the storage space allocated for the data, in the absence of alignment requirements, by simply packing the elements one after another. However, imposing alignment requirements on the data elements may force the introduction of padding to fill holes in storage caused by the alignment requirements. This padding may increase the amount of storage required to store the data elements. The amount of storage required to store the data elements may depend on the order in which the data elements are arranged in storage. This is because the padding necessary to accommodate the data alignment requirements may be different depending on the order that the data elements are stored.
In general, in one aspect, the invention features a method for allocating storage for a header and one or more data elements in a data storage facility. The data storage facility is divided into words. Each word includes one or more incidents of one or more types of alignment boundaries. Each incident of a type of alignment boundary falls on a multiple of a base amount of units from the beginning of the word. The base amount is unique for each type of alignment boundary. Each data element is required to satisfy an alignment requirement, which is the size of alignment boundary with which it is required to align. Each data element has a length that is a multiple of its alignment requirement. The method includes computing a hole size, B, that is a portion of a word that would be unallocated if storage were allocated to the header and to the data elements in a preferred order. The method further includes finding a subset of data elements S={Fi1, Fi2, . . . , Fin} that satisfy the following equation:
(SizeModN(Fi1)+SizeModN(Fi2)+ . . . +SizeModN(Fin ))mod N=B
where SizeModN(F) is the size of F modulo N, and N is the largest alignment requirement associated with any data element. The method further includes allocating storage to data elements in S first and allocating storage to the remaining data elements in the preferred order.
Implementations of the invention may include one or more of the following. Finding the subset of data elements S may include accessing a lookup table using N and B as indexes to retrieve a first set of one or more modulo values M1,1 . . . R and a frequency F1,1 . . . R for each modulo value. Finding the subset of data elements S may further include searching for a subset T={Ai1, Ai2, . . . , Ain} of data elements, such that for every p from 1 to R there is a set of F1,p data elements such that the size of those data elements modulo N is M1,p, and setting S=T. Finding the subset of data elements S may further include, if the search for a subset T of data elements fails using the first set of one or more modulo values M1,1 . . . R and the frequency F1,1 . . . R for each modulo value, accessing the lookup table again using N and B as indexes to retrieve a second set of one or more modulo values M2,1 . . . R and a frequency F2,1 . . . R for each modulo value and searching for a subset T using M2,1 . . . R and a frequency F2,1 . . . R. The method may then set S=T. Finding the subset of data elements S may further include, if the search for a subset T of data elements fails, setting B=Bxe2x88x921 and repeating the search.
In general, in another aspect, the invention features a lookup table useful in allocating storage for a header and one or more data elements in a data storage facility. The data storage facility is divided into words. Each word includes one or more incidents of one or more types of alignment boundaries. Each incident of a type of alignment boundary falls on a multiple of a base amount of units from the beginning of the word. The base amount is unique for each type of alignment boundary. Each data element is required to satisfy an alignment requirement, which is the size of alignment boundary with which it is required to align. Each data element has a length that is a multiple of its alignment requirement. The lookup table includes a plurality of solution data objects, each of which includes one or more data element modulus objects D1 . . . n, each data item modulus object containing a data item size modulo the largest alignment requirement associated with any data element. Each of the plurality of solution data objects further includes a frequency object F1 . . . n for each respective data item modulus object. Each frequency object contains the number of respective data elements required, in combination with the other data elements identified by the data element modulus objects to satisfy the following equation:
(F1xc2x7D1+F2xc2x7D2+ . . . +Fnxc2x7Dn)mod N=B
where B is the size of a portion of a word that would be unallocated if storage were allocated to the header and to the data elements in a preferred order, and N is the largest alignment requirement associated with a data element.
Implementations of the invention may include one or more of the following. Each solution data object may include a count data object that contains the number of data element modulus objects in the respective solution data object. The solution data objects may be grouped into hole size data objects, where each hole size object is associated with a selected value of B. The hole size objects may be grouped into alignment data objects, where each alignment object is associated with a selected largest alignment requirement.
In general, in another aspect, the invention features a method for building a lookup table for allocating storage for a header and one or more data elements in a data storage facility. The data storage facility is divided into words. Each word includes one or more incidents of one or more types of alignment boundaries. Each incident of a type of alignment boundary falls on a multiple of a base amount of units from the beginning of the word. The base amount is unique for each type of alignment boundary. Each data element is required to satisfy an alignment requirement, which is the size of alignment boundary with which it is required to align. Each data element has a length that is a multiple of its alignment requirement. The method includes for s from 1 to t, performing the following
for selected s-tuples (x1, x2, . . . xs) and for B ranging from 1 to p find the smallest frequencies (f1, f2, . . . fs) such that
(x1xc2x7f1+x2xc2x7f2+ . . . +xsxc2x7fs)mod N=B.
where N is the largest alignment requirement associated with a data element, t is an upper limit on the number of distinct elements in a valid solution, and p is Nxe2x88x921.
Implementations of the invention may include one or more of the following. t may be 3 and p may be 7. The method may further include storing the s-tuples and respective frequencies in the lookup table indexed by B. The sum of two or more elements of each selected s-tuple may not equal 2t or a multiple of 2t. The selected 3-tuples when t is 3 may include: (1,2,3), (1,2,4), (1,3,6), (1,4,5), (1,4,6), (1,5,6), (2,3,4), (2,3,7), (2,4,5), (2,4,7), (2,5,7), (3,4,6), (3,4,7), (4,5,6), (4,6,7), and (5,6,7). The selected 2-tuples when t is 3 may include: (1,2), (1,3), (1,4), (1,5), (1,6), (2,3), (2,4), (2,5), (2,7), (3,4), (3,6), (3,7), (4,5), (4,6), (4,7), (5,6), (5,7), and (6,7). The selected 1-tuple when t is 3 may include: 1, 2, 3, 4, 5, 6, and 7.
In general, in another aspect, the invention features a computer program, stored on a tangible storage medium, for use in allocating storage for a header and one or more data elements in a data storage facility. The data storage facility is divided into words. Each word includes one or more incidents of one or more types of alignment boundaries. Each incident of a type of alignment boundary falls on a multiple of a base amount of units from the beginning of the word. The base amount is unique for each type of alignment boundary. Each data element is required to satisfy an alignment requirement, which is the size of alignment boundary with which it is required to align. Each data element has a length that is a multiple of its alignment requirement. The program includes executable instructions that cause a computer to compute a hole size, B, that is a portion of a word that would be unallocated if storage were allocated to the header and to the data elements in a preferred order. The program further includes executable instructions that cause the computer to find a subset of data elements S={Fi1, Fi2, . . . , Fin} that satisfy the following equation:
(SizeModN(Fi1)+SizeModN(Fi2)+ . . . +SizeModN(Fin ))mod N=B
where SizeModN(F) is the size of F modulo N, and N is the largest alignment requirement associated with a data element. The program further includes executable instructions that cause the computer to allocate storage to data elements in S first; and allocate storage to the remaining data elements in the preferred order.
Implementations of the invention may include one or more of the following. In finding the subset of data elements S, the computer may access a lookup table using N and B as indexes to retrieve a first set of one or more modulo values M1,1 . . . R and a frequency F1,1 . . . R for each modulo value, search for a subset T={Ai1, Ai2, . . . , Ain} of data elements such that for every p from 1 to R there is a set of F1,p data elements such that the size of those data elements modulo N is M1,p, and set S=T. In finding the subset of data elements S, if the search for a subset T of data elements fails using the first set of one or more modulo values M1,1 . . . R and the frequency F1,1 . . . R for each modulo value, the computer may access the lookup table again using N and B as indexes to retrieve a second set of one or more modulo values M2,1 . . . R and a frequency F2,1 . . . R for each modulo value, search for subset T using M2,1 . . . R and a frequency F2,1 . . . R, and set S=T. If the search for a subset T of data elements fails, the computer may set B=Bxe2x88x921 and repeat the search.