1. Field of the Invention
The present invention relates to a data management apparatus and method which enables efficient storage of a large amount of data as well as efficient extraction of necessary data in an apparatus that stores a large amount of data, particularly in such an apparatus as a database server.
2. Description of the Related Art
Among data storage apparatuses is a data management apparatus that deals with files (what is called blocked transposed files) in which the file format as viewed from an application is such that fields of the same kind are collected into a group.
FIG. 15 is a conceptual diagram showing a data management concept of a blocked transposed file in a conventional data management apparatus (Japanese Unexamined Patent Publication No. Hei. 11-154155). In FIG. 15, reference numeral 1000 denotes an internal file whose format is defined to show a process of format conversion. The internal file 1000 consists of a plurality of records 1002 each constituted of a plurality of (first to Nth) internal fields 1001. Reference numeral 1010 denotes a logical file whose format is defined for interface with an application program. The logical file 1010 consists of a plurality of records 1012 each constituted of a plurality of (first to Nth) logical fields 1011. Reference numeral 1020 denotes a blocked transposed file in a state that the file has been subjected to conversion from the internal file format to the blocked transposed file format and is ready for storage in a disk or the like. In the blocked transposed file 1020, a plurality of fields 1021 of the same kind constitute a unit called a block 1022 and a plurality of blocks 1022 constitute a unit called a group 1023.
FIG. 16 shows conversion from the internal file format to the blocked transposed file format.
The conversion is performed in the following manner. First, internal fields 1001 of the same kind, for example, first internal fields 1001, of first to Lth records (one processing unit) of an internal file 1000 are cut out and stored as a block 1022 that is part of a blocked transposed file 1020. Then, second internal fields 1001 are cut out and stored as part of the blocked transposed file 1020 in the same manner. This operation is repeated until Nth fields of the internal file 1000 are stored. Then, the same operation is performed for (L+1)th to 2Lth records (one processing unit) of the internal file 1000.
The conversion into the blocked transposed file 1020 is performed by repeating the above operation.
FIG. 17 shows an example corresponding relationship between a logical record 1012 as a processing unit in an application program and an internal record 1002.
As shown in FIG. 17, in a record of the internal file format, the length of fields 1001a-1001f is set at a certain fixed value. The record of the internal file format is obtained by modifying logical fields 1011a-1011d of the logical record 1012 so that they conform to the fixed boundaries.
The logical record 1012 that is handled by an application or the like is converted into the internal file format. In this conversion, first, the logical field 1011a is made the internal field 1001a as it is because it has the same length as the internal field length. However, since the logical field 1011b is shorter than the internal field length, it is made the internal field 1001b through padding such as insertion of null data. Since the logical field 1011c is longer than the internal field length, it is decomposed into a plurality of internal fields 1001c-1001e. 
In general, the number of logical fields that are actually needed in an individual process is restricted and in many cases not all logical fields are needed. After conversion into the blocked transposed file format, it is sufficient to read out blocks of related logical fields. The efficiency of processing can be increased as a result of reduction in input/output information amount. For example, assume a employee information blocked transposed file shown in FIG. 18 in which the first, second, third, fourth, . . . , 99th fields are assigned to the name, section number, section name, employee number, . . . , telephone number, respectively. An employee telephone number list can be generated by storing only the first, fourth, and 99th blocks in an input/output buffer and performing proper processing. It is not necessary to read out the other fields.
Further, since the blocking is so made that each block includes the same number of records, the file reading direction can be kept the same by performing a readout in units of that number of records. Where files are stored in a magnetic disk apparatus or the like, the head movement distance can be minimized and hence the processing speed can be increased.
Incidentally, in recent years, there have been proposed a plurality of data conversion processing methods in which in storing a file in a disk apparatus or the like, with attention paid to redundancy of data, the file is stored in the disk apparatus after being subjected to data compression and the original data is decompressed when necessary. Performing such data compression provides advantages that the capacity of a necessary storage device can be reduced and the processing speed can be increased by increasing the efficiency of input/output processing on the storage device.
In general, in data conversion, the ratio of the post-conversion data length to pre-conversion data length varies depending on the properties of the data. However, in the conventional data management method using blocked transposed files, a file cannot be processed unless the number of records belonging to the same group of a blocked transposed file is fixed and the data length is fixed in all blocks belonging to the same group. This causes a problem that such a data management method is not compatible with both advantages of reduction in storage capacity and increase in processing speed.
Although it is possible to compress the entire blocked transposed file, a reading process for a compressed file is required to be performed after the entire blocked transposed file is decompressed. This results in a problem of deterioration in performance.
Further, in this case, the entire blocked transposed file should be compressed according to one kind of data conversion method. There is a problem that the conventional data management method using blocked transposed files cannot provide operations that are closely adapted to respective kinds of data.
The present invention has been made to solve the above problems in the art, and an object of the invention is therefore to make it possible to increase the input/output efficiency and reduce the storage capacity by storing a blocked and transposed result after subjecting it to data conversion on a block-by-block basis.
In accordance with one aspect of present invention, there is provided a data management apparatus comprising first conversion means for generating a first block by dividing at least one record consisting of a plurality of fields into the fields and combining fields of the same kind; and second conversion means for converting the first block into a second block by using a data conversion method stored in advance, and for storing the second block in a storing means.
In accordance with a another aspect of the present invention,there is provided a data management method comprising a virtual conversion step of repeatedly executing a process of reading at least one record from an input file having records each consisting of a plurality of fields, adds the at least one record to the buffer, and converting the record in the buffer into a post-conversion block on a field-by-field basis until a data size of the post-conversion blocks of all field kinds of records in the buffer exceeds a predetermined threshold value; a number-of-records calculation step of storing the number of records in the buffer at the time of an immediately preceding process when the data size has exceeded the prescribed threshold value; and a conversion step of reading out records of the stored number from the input file, converting the read-out records into post-conversion blocks on a field-by-field basis, and storing the post-conversion blocks in a storing means.
In accordance with a further aspect of the present invention, there is provided another data management method comprising a first conversion step of generating first blocks by reading out records of a prescribed amount from an input file having records each consisting of a plurality of fields, converting the read-out records into a fixed-length field format, dividing the converted records into fields, and combining fields of the same kind; and a second conversion step of converting the first blocks into second blocks by using a data conversion method stored in advance.