1. Field of the Invention
The present invention relates to a file management method, particularly to an improved file management method aiming at efficient reading of only required data from mass data storage, typically applicable to data retrieval from databases.
2. Description of the Related Art
When retrieving desired data from a file into which data has been stored in units of records, it has been necessary to write an entire record including that data into an input-output buffer to locate and read a field or fields in which the data exists. If, for example, one record constitutes 512 bytes, it has always been necessary to read one record of 512 bytes from the file, even if only 4 bytes of data are needed. Using a corporate employee database as an example, when creating an address book of the members by retrieving their names and addresses from the database, with the above data reading method, it is necessary to read the records for all members from the database to the input-output buffer and retrieve their names and addresses from the records. Because unwanted data will also be read from the database to the input-output buffer, this method is inefficient and unnecessarily increases the processing load.
The present inventor previously proposed a file management method that enables data reading in units of fields of all records rather than in units of records (Japanese Patent Application No. Hei 09-319527, hereinafter referred to as xe2x80x9cprevious referencexe2x80x9d).
This file management method is described with reference to FIG. 9. When an original file 1 is assumed to hold a plurality of records 3 consisting of a plurality of fields 2, the records 3 are divided into groups, each group consisting of a predetermined number of records, for example, N records. For each group of records, the sequential fields 2 from the beginning of the records are then sorted according to their relative positions into blocks 4. The sequentially lined blocks 4 within each record group are transferred to the corresponding positions in a row so as to be reorganized into a group 5. The groups 5, thus reorganized from all records 3, are sequentially assembled into a transposed file 6.
By generating the transposed file 6 reorganized in this way, efficient data reading with reduced input/output data amount can be achieved. In the above example of a corporate employee database, in order to retrieve, for example, only the member names from the records 3, only sequential reading of only the block 4 including the corresponding fields 2 is required. There is no need to read other data such as member ID number, age, etc. from this database.
According to the previous reference, all fields 2 constituting blocks 4 (hereinafter referred to as xe2x80x9cinternal fieldsxe2x80x9d) had a fixed length for rapid processing purposes. Especially, the internal fields were aligned with constant boundaries such as word boundaries through consideration of avoiding excessively frequent access for physical input/output to the disk unit of the file system. All fields constituting the original file 1 (hereinafter referred to as xe2x80x9clogical fieldsxe2x80x9d), if variable in length, were converted into one or more fixed-length internal fields so as to be coincident with constant boundaries, as shown in FIG. 10. An internal field not filled with the data of the corresponding logical field was padded.
Although the transposed file generating process, as illustrated by FIG. 9, was simplified such that blocks 4 could be generated from the fields 2 constituting the original file 1, this process, in fact, comprised two-stage procedure; one subprocess of converting logical fields into fixed-length internal fields and another subprocess of sorting the internal fields into blocks 4.
In the method according to the previous reference, however, a logical field larger than the internal field size was divided into two or more pieces to fit into internal fields because the internal fields were fixed length. As a result, additional reading must occur for the internal fields generated as the divisions of the logical field.
Furthermore, when internal fields were sorted into blocks, a constant number of fields were assembled into one group. This processing could not always perform efficient data retrieval in further consideration of physical input/output processing dependent on the writing timing and the number of records to be registered into the original file.
An object of the present invention is to provide an improved file management method that enables more flexible and efficient file management, addressing the drawbacks described above.
A file management method developed to attain such object, according to the present invention, is intended to manage an original file that holds a plurality of records, each of which includes at least one variable-length field. This file management method comprises a field converting step for converting all fields that constitute the records stored into the original file to variable-length internal fields with field-to-field correspondence being maintained; a record group generating step for generating record groups by dividing all records consisting of the internal fields into a plurality of groups; a block generating step for generating blocks by sorting the internal fields of all records within a record group into blocks according to their relative positions so that records"" fields in the corresponding position will be assembled into a same group; and a transposed file generating step for generating a transposed file by setting in a row all blocks generated from each record group to reorganize a new group and then setting thus reorganized groups in a row. The transposed file is accessed in response to a request for data reading from the original file.
The field converting step may make all internal fields terminate, aligning with boundaries of physical units for access by adding space area to the internal fields.
The record group generating step may generate record groups comprising a variable number of records and the storage location of target field data is determined by referring to the previously set number of records to constitute an individual record group when the transposed file is accessed.
The block generating step may generate all groups of same size by adding space area to undersized blocks.
The block generating step may also generate blocks comprising a plurality of fields in the corresponding positions in all records in a record group.
Furthermore, the block generating step may assemble non-adjacent fields into one block.
In addition, if the size of a generated block is not equal to an integer multiple of minimum physical input-output units, the block generating step may add space area to the block to adjust the block size to an integer multiple of the minimum input-output units.
According to t he present invention, because o f variable-length internal fields, logical fields constituting the records in an original file are converted to internal fields without being split and field-to-field correspondence is maintained. This can eliminate additional access for physical input/output when reading data of one field, and data reading can thus be rapidly accomplished.
By adding space area to internal fields, all internal fields can terminate, aligned with boundaries of physical units for access, thus achieving efficient reading.
Providing record groups consisting of a fixed number of records, as in the previous method, when completing a record group with some of a predetermined number of records being empty, all records in the group must be read from a disk unit and loaded into a buffer. Then, data is written into empty records and the contents of the buffer are written back into the disk unit. Consequently, even complete records filled with data that would be intact if possible must be read together during this process. The present invention eliminates this bottleneck by allowing all record groups to comprise a variable number of records that are now full with data.
Furthermore, the present invention enables all blocks to be the same size and can enhance the processing efficiency.
The present invention also allows a plurality of internal fields of a record to be assembled from all records of a record group into a block. Because of less frequent access for input/output, efficient reading of a plurality of field""s data can be performed.
Especially, if simultaneous use of a plurality of fields positioned separately in a record of an original file is anticipated, these fields of all records in a record group are sorted into a block, thus more efficient reading can be performed.
Furthermore, after a block is generated from internal fields, if its size is not equal to an integer multiple of minimum physical input/output units, padding is used to adjust the block size to the input/output units. The present invention can thus achieve efficient input/output processing.