The present invention relates to a data processor with an apparatus capable of converting data elements at a higher speed and to a method for the conversion, and in particular, to a vector data processor having a vector processing-function for relational data base and capable of sequentially converting data elements.
There has been rapidly developed a data base technology for accomplishing an efficient control of a large amount of data. The data base processing handles mainly three types of data bases, namely, hierarchical, network and relational data bases. The former two data bases are designed to process a large volume of data at a high speed and requires complex operations suitable for the expert users versed n the data base operations. On the other hand, the relational data base, handling a relatively small amount of data, is implemented by a simple operation and is expected to be considerably developed in the future; however, it has a problem that the processing speed is reduced when the data capacity increases.
FIG. 1 is a schematic representation of a logical data structure of a relational data base containing information about parts as an example. The information is stored in a table format in a disk unit 110 (FIG. 2). A table 100 comprises a line 101 referred to as a record, row, or tupple and a column 102. As shown in FIG. 2, the data is physically stored in record units in fixed-length blocks (120a, 120b, etc.) each called a page on the disk 110. A plurality of records are stored in a page in ordinary cases. An address of a record on a disk (referred to as a record address) comprises a page number and a displacement address from the head of a page. Such an example has been disclosed by C. J. Date in "An Introduction to Database Systems" (3rd ed. p 173-174). In this report, the record address is referred to as a tupple identifier (TID).
A relational data base control program 240 stored in a main storage 200 reads pages containing the necessary records from the disk and stores the pages in an input/output buffer area 201 in the main storage. The pages and records thus obtained are indicated by reference numerals 220a, 220b and 22la, 22lb, respectively in FIG. 2. Unlike a page and a record on the disk which are addressed by use of a record address and a page number, respectively, a page and a record in the main storage are addressed by a main storage address; consequently, the relational data base control program must include processing for converting the addresses therebetween.
When the main storage has a small capacity for storing data, each page must be read to obtain a record from the disk, hence the most part of the processing time is consumed for the input/output time in order to access the disk 110. The recent progress in the semiconductor technology, however, has realized a large capacity main storage, for example, 32 megabytes (MB), which makes it possible to store the overall table or all pages in the buffer area 201 in the main storage, thereby enabling to greatly reduce the input/output time to access the disk 110. The problem to reduce the processor execution time remains.
Various vector processors have been proposed to improve the processing speed, for example, CRAY-1 developed the first machine of such type.
In the past, however, the processing was conducted by using a record as the processing unit as depicted in FIG. 1. This method can be advantageously applied to a main storage having a small capacity, for example, when the buffer area is equivalent to only several pages. On the other hand, when the data elements in the record direction is assumed to constitute a vector, the data type and arithmetic operation each varies between these data elements; thus such a data structure has been considered not to be suitable for the vector processing. Contrary, if the main storage has a large capacity, the following processing system in accordance with the novel ideas of the present invention becomes implementable.
That is, if the overall table fetched from the disk is stored beforehand in the main storage, each set of data elements obtained along the column direction can be assumed to be a vector data. In this situation, the vector processing can be favorably applied because:
(1) Each vector element is of the same data type, and
(2) Each vector element undergoes the same arithmetic operation, for example, move, comparison, or selection. In the following paragraphs, the new operations of a concrete relational data base program will be described by referring to each step of a data retrieval example for obtaining the part names for which the manufacturer's name is HITACHI in the parts table 100 of FIG. 1.
(1) A list of record addresses of the records belonging to the table is created in the vector work area 18 of the main storage 200 (indicated by reference numeral 7 in FIG. 2). The table has ordinarily an index for each several columns. For each of record, the index contains a pair of the column value and a record address (not shown in FIG. 2). Consequently, if the index is constructed in the vector format, the record address list can be created by use of a vector move instruction. For example, assume that an index is provided for the part codes of the part table 100 in FIG. 1. In this case, however, the obtained record addresses are not necessarily ordered in the page number sequence, that is, they are obtained at random with respect to the page number.
(2) A main storage address list of the records belonging to the table is created in a vector work area 31 of the main storage 200 (indicated by reference numeral 6 in FIG. 2). This provision implements the address conversion described before so as to improve the processing speed in accordance with the present invention. For each record, it has been conventionally required to extract a page number based on a record address and to search for the page having the same page number in the pages (221a, 221b, etc.) of the buffer area 201 of FIG. 2. The time required for this processing is increased as the buffer area capacity becomes greater. In addition, the processing speed improvement by use of the vector arithmetic operation was not attempted in the prior art because the vector processing was not applicable to this processing; consequently, this processing has been a bottleneck in the system performance if it is utilized in a relational data base system in a large capacity main storage.
(3) Based on a main storage address of the record obtained in step (2) above, the item name address of the manufacturer's name is calculated for each record, and the record whose manufacturer's name is HITACHI is selected by comparing the manufacturer's name. This processing is accomplished by use of instructions such as a vector add instruction and a vector compare instruction.
(4) The records satisfying the retrieval processing conditions are transferred. Only the part name items of the records selected in step (3) above are transferred to an area, for example, a display area for storing the retrieval results. This processing is carried out by use of instructions such as a vector move instruction having an indirect address function.
Among the processing of steps (1) to (4) described above, the address conversion of step (2) is the most difficult processing in applying the vector processing thereto in order to improve the processing speed by use of a vector processor, and will thus become a bottleneck of the system performance. This is a new problem appeared in this attempt to improve the relational data base processing speed by use of the vector operation.
An address conversion mechanism in a data base has been disclosed, for example, in the U.S. Pat. No. 4,024,508 of C. W. Bachman et al assigned to Honeywell Information Systems, Inc. In this system a descriptor is prepared for each of the pages which contain at least one of the records in a data base and which are already read out to a main storage. Each descriptor includes a corresponding page. When a main storage address of a record is to be determined, descriptor which includes the same page number as one assigned for the record must be searched. The descriptors are provided on the main storage and are sequentially chained. Therefore processing time for the search increases with a total number of the descriptors.