1. Field of the Invention
The present invention relates to a database processing and a data processing performed in a computer system such as reordering and sorting a large quantity of data.
2. Description of the Related Art
In general, sorting is rarely processed on its own. The sorting is commonly used in combination with summing. An importance of combining the sorting and the summing in a data processing is described herewith. This will be described using an example. The example in Table 1 shows data of sales in a company.
TABLE 1 PRODUCT BRANCH QUANTITY SALES AMOUNT DATE TV TOKYO 2 200 3/15 TV OSAKA 1 100 4/21 RADIO OSAKA 4 100 4/21 TV OSAKA 1 100 4/28 RADIO TOKYO 1 25 5/10 RADIO TOKYO 3 75 5/15
In this example, the data are being used for such summing as product quantity, sales amount per branch, and monthly total sales amount. Results of these summing are used to analyze the company's activities, markets, inventory controls and preparation for stocks. In Table 1, the data are being entered in an order of occurrence, or in other words, in a date order. Since the data are in the date order, upon summing of these data, sorting will be required for each of them, before summing the data. Results of the sorting are used to obtain the results of the summing.
For instance, when summing for sales amount per branch, a sorting is performed using a branch field. In the present specification, such field being used for sorting is called a "sort key". A result of the sorting using the branch field is shown in Table 2.
TABLE 2 ##STR1##
To the resulting data of Table 2, under the same branch field, values are summed together. To sum for the sales amount, the resulting data of Table 2 are searched from top to bottom, and by doing so the sales amount fields keeps on adding until a different branch field value, or in other words, a different sort key value is detected. The result obtained from this summing is shown in Table 3.
TABLE 3 BRANCH PRICE TOKYO 300 OSAKA 300
Further, certain cases of summing may involve a plurality of sort keys, for instance, a case such as summing for sales amount of products per every branch is one example. In such cases, a branch key is used as a first sort key and a product key is used as a next sort key when sorting the data. The result of sorting an original data of Table 1 using two sort keys is shown in Table 4.
TABLE 4 ##STR2##
Using the sorted result of Table 4, subtotals of each product for each branch and a total value of all the products in each branch are calculated. In this example, following branch and product combinations are obtained: Tokyo & Radio; Tokyo & TV; Osaka & Radio; and Osaka & TV. In addition to the four combinations, two subtotal sale values in Tokyo and Osaka and the total of the two subtotals are calculated. In this example, the sorted data are searched from the top to the bottom to search for the branch field and the product field combinations in order, and accompanying price field for each combination is added until a different combination is detected. Further, for the two different branch fields, the process outputs the subtotal per branch. The result is shown in Table 5.
TABLE 5 BRANCH PRODUCT SALES AMOUNT TOKYO RADIO 100 TOKYO TV 200 TOKYO SUBTOTAL 300 OSAKA RADIO 100 OSAKA TV 200 OSAKA SUBTOTAL 300 TOTAL 600
Summing using a plurality of sort keys will be called "layered summing" from hereinafter. As mentioned previously, in a data processing, it is indispensable to combine sorting in each field and summing, especially when dealing with a large quantity of accumulated data.
FIG. 13 illustrates a conventional data processing apparatus described in "Information Processing" Vol.33, No.12, p1416.about.1423. A description of the numbered components indicated in FIG. 13 are: a data processing apparatus 1; a sort processing unit 2; a sum processing unit 3; a control unit 6; a merge processing unit 7; and a host computer 8.
An operation of the conventional data processing apparatus is described next using FIG. 13. When a request for the data processing occur at the host computer 8, the host computer 8 sequentially sends data for processing to the data processing apparatus 1.
An amount of the data possible for sorting by the sort processing unit 2 depends on a memory capacity inside the sort processing unit 2. There are two cases of possible processing, depending on the amount of data sent from the host computer 8, that is, whether the data is over or under the sorting capacity.
(Case 1) A case when the data sent from the host computer is under the sorting capacity of the sort processing unit 2.
The case 1 is illustrated in (a) of FIG. 14. When the data is inputted to the data processing apparatus 1, a sorting is performed in the sort processing unit 2, then a summing is performed in the sum processing unit 3 using a result of the sort processing unit 2, and a result from the sum processing unit 3 is sent back to the host computer 8.
(Case 2) A case when the data sent from the host computer is over the sorting capacity of the sort processing unit 2. The data are processed using the following two phases.
(Phase 1)
The phase 1 of case 2 is illustrated in (b) of FIG. 14. The sort processing unit 2 creates a data sorted within the sorting capacity of the sort processing unit 2, and the data processing apparatus 1 returns a result of the sorting from the sort processing unit 2 to the host computer 8. In the phase 1, the sum processing unit 3 is not yet operating.
(Phase 2)
The phase 2 of the case 2 is illustrated in (c) of FIG. 14. The data sorted in part according to phase 1 is resent from the host computer 8 to the data processing apparatus 1. The sorted data is sent to the merge processing unit 7 for merging the sorted data, and the a resulting data from the merging is sent to the sum processing unit 3 for summing, and a result of the summing is returned to the host computer 8.
For both cases 1 and 2, series of controls are performed by the control unit 6. In the sort processing unit 2 comprising a plurality of sort processors P1, P2, P3 and P4 as shown in FIG. 15. A sort processor P1 takes two input data at a time and the two input data are reordered (sorted) and sent to the next step, as shown in FIG. 15. In the next step, a sort processor P2 takes two sorted input data that are two apiece and the sort processor P2 merges the two together to make a sorted four apiece data, and the four apiece data is sent to the next step, a sort processor P3. The operation similar is repeated onwards.
Using a plurality of sort processors, it is possible to start a processing before even completing the processing in a previous sort processor. In this way, by inputting data sequentially, though with some delay, a sorted result is outputted in parallel with the data input.
The merge processing unit 7 is described next. In general, merge processing unit is configured from a general-purpose processing unit such as microprocessor and controlled by its program. A flow of the process in the merge processing unit 7 is shown in FIG. 16. The flow presumes a descending sorting, and assumes for merging M sorted data sequences to one sorted data sequence.
In step S101 of FIG. 16, a number M of sorted data sequences for merging is loaded to a variable m. In step S102, a top data in m sorted data sequences are read. Table 6 is an example when the data sequences are M=2.
TABLE 6 ##STR3##
As the top data, 6 is read from the data sequence 1, and 8 is read form the data sequence 2.
Next, in step S103, a maximum value is searched for from data read at step S102. In this example, the maximum value is 8. In step S104, d is set to 8, and the data sequence which 8 belongs to is the data sequence 2, so i is set to 2. Step S105, outputs the value of d. In step S106, a next data is read from the data sequence with loaded number i. In this example, i=2, and the next data in data sequence 2 is 7, therefore, data 7 is read. Such process continues on, and step S107 is a step to determine whether all data has disappeared from the data sequence or not. When a data still remains, the process continues and return to step S103. When there is no more data in the data sequence, a number of data sequences m for processing is reduced by 1 in step S108, and the process returns and continues from step S103. When m=0 in step S109, the process completes.
The sum processing unit 3 is described next. In general, sum processing unit is configured from a general-purpose processing unit such as microprocessor and controlled by its program. The flow of processing in the sum processing unit 3 is illustrated in FIG. 17.
In step S121 of FIG. 17, an invalid key value is loaded to a variable PK, and a variable AC is initialized to 0. The variable PK keeps a previous sort key value which is processed previous to the processing data. Likewise, the variable AC keeps the sum value of the field to this point. Then, the next data is read at step S122, and the sort key value is kept at variable K, and the sum value is kept at V. If the sorting result has been read through completely, the process completes at step S123. If not, the key value kept at the variable K is compared with the previous sort key value at the variable PK in step S124.
When the compared sort key values are found to be not coinciding, the sum value kept at AC is outputted (step S125) and V is set to AC (step S126). When outputting the sum value, the sum value is written to the currently reading data and outputted.
If, however, the sort key values are coinciding, there is no need to output the data, therefore, the data is deleted in step S127. In step S128, V is added to AC.
For both cases mentioned previously, PK is updated to K in step S129, and the process returns to the step S122.
As such, as long as the data with the coinciding sort key values are inputted, the values keep on adding, and when the data with a different sort key value is detected, then at this point the sum value is outputted. Note that for any cases, the sort key value set to PK in step S121 are assumed to be coinciding with any other sort key values.
For a case when there are a plurality of sort keys, these sort keys are combined as one key and processed according to the previously described flow. Also, when there are a plurality of sum fields (S fields) present, for example, in addition to calculating totals of the price and the quantity, the S number of AC are prepared and the steps S125, 126, 128 are repeated S times each.
The conventional data processing apparatus as previously described has the following problems.
The step S124 in the previously described sum processing unit is the comparing process between the sort key value of a previous data and the sort key value of the current data. As a general way to deal with varied key lengths for every sorting, the comparing process is executed by 1 byte at a time. Due to this, there is a problem of decline in the processing performance.
When a plurality of keys are present in the previously described sum processing, the keys are taken in group as one. Therefore, the layered summing is difficult to perform. For example, when Table 7 is sorted for the branch and the product, followed by summing, a result is shown in Table 8.
TABLE 7 PRODUCT BRANCH QUANTITY SALES AMOUNT DATE TV TOKYO 2 200 3/15 TV OSAKA 1 100 4/21 RADIO OSAKA 4 100 4/21 TV OSAKA 1 100 4/28 RADIO TOKYO 1 25 5/10 RADIO TOKYO 3 75 5/15
TABLE 8 BRANCH PRODUCT SALES AMOUNT TOKYO RADIO 100 TOKYO TV 200 TOKYO SUBTOTAL 300* OSAKA RADIO 100 OSAKA TV 200 OSAKA SUBTOTAL 300* TOTAL 600*
The sum value marked with asterisk (*) in Table 8 need to be calculated using alternative means. Therefore, the whole system become complex, which leads to a decline in the performance.
A difficulty in the conventional layered summing is knowing a number of data before outputting the sum value from every different sort keys. For example, data inputted as shown in Table 9, its sum value is shown in Table 10, and a result in Table 10 has a greater number of data than a number of input data in Table 9.
TABLE 9 PRODUCT BRANCH QUANTITY SALES AMOUNT DATE TV TOKYO 2 200 3/15 TV OSAKA 1 100 4/21 RADIO OSAKA 4 100 4/21 RADIO TOKYO 3 75 5/15
TABLE 10 BRANCH PRODUCT SALES AMOUNT TOKYO RADIO 75 TOKYO TV 200 TOKYO SUBTOTAL 275 OSAKA RADIO 100 OSAKA TV 100 OSAKA SUBTOTAL 200 TOTAL 475
For such case when the number of data after summing will be more than the number of input data, for example, it is possible to use an original data space to output totals of radio and TV in Tokyo branch, however, there is no place that can be used to output the grand total of Tokyo in the original data space. With the conventional method, the sum processing using a plurality of keys are difficult. In addition, when there happens to be no sale in a particular day, this can result in an another problem in the summing. That is, if no TV was sold on March 15th, for example, under the previously described process, the sum value of TV will not be outputted. The sum value indicating 0 for TV sales is required.
When a plurality of sum fields are present as in the previously described summing, the summing need to be repeated several times, which leads to problem with a decline in processing speed.
When input data to the data processing apparatus exceeds the capacity of the sort processing unit, the summing and the layered summing are difficult to perform. In this case, the data for the summing and layered summing are returned to the host computer, therefore, there is a problem with the decline in performance.
For example, sums for such as Tokyo and Osaka are performed as it is. However, there is a case that a vicinity of Tokyo is summed as "Others". Such change of a sum level is difficult.