1. Field of the Invention
The invention relates to a data processing method and data processing apparatus or processing large amounts of data using a computer or other information processing apparatus, and particularly to a method and apparatus for searching for, tabulating and sorting table-format data.
2. Description of the Prior Art
Conventionally, large amounts of data are accumulated and searching and tabulating and other types of data processing is performed on the accumulated data. This data processing may be done using, for example, a known computer system including a CPU, memory, peripheral interface, a hard disk or other auxiliary storage device, a display, a printer or other output device, a keyboard, a mouse or other input device, and a power supply unit connected via a bus, and particularly as software that can be run on a readily available commercial computer system. In order to perform the aforementioned searching, tabulating or other types of data processing, various types of databases that particularly store large amounts of data are known. Among various types of large amounts of data, there is a particularly strong demand to process data that can be expressed in a table format. FIG. 1 is a diagram showing an example of expressing the data to be processed in a table format. FIG. 1 shows an example wherein the sex, age and occupation data for a large number of people, e.g. 1 million, are stored in a table. In FIG. 1, the horizontal rows in the table, namely the so-called records, consist of the record number, and the sex, age and occupation fields corresponding to the record number. The vertical columns in the table consist of the record number, sex, field, age field and occupation field. The table indicates that the person with the record number of “0” has a sex of female, age of 18 and occupation of programmer. In the following explanation, the data such as “Female,” “18” and “Programmer” set in the various fields are called field values. In addition, in the following explanation, unless otherwise indicated, the table-format data consisting of 1 million records shown in FIG. 1 is used as a specific example of a large amount of data.
Whether or not large amounts of data can be searched for or tabulated efficiently depends on the format in which the large amount of data is stored. Conventionally, typical known storage techniques include the so-called “record-sequential” and “field-sequential” storage techniques shown in FIGS. 2A and 2B, respectively.
FIG. 2A and FIG. 2B show a representation of the data storage format on a storage device, e.g. a hard disk. In the case of the record-sequential storage technique in FIG. 2A, a set of the field values of sex, age and occupation for each record number is stored on disk in the order of increasing logical addresses sequentially for each record number. On the other hand, in the case of the field-sequential storage technique in FIG. 2B, for each field, the field values are stored in record number order grouped by field in the direction of increasing logical addresses. To wit, in the example of FIG. 2B, the field values for the sex field corresponding to record numbers “0” through “999999” are arranged in order, and next, the field values for the age field are arranged in record number order, and then the field values for the occupation field are arranged in record number order.
In the case of the aforementioned prior art, field values corresponding to all fields for all record numbers are stored as is in a two-dimensional data structure (with the record number as one dimension and the other field values as one dimension). Hereinafter, such a data structure in particular shall be referred to as a “data table.” In the case of the prior art, when searching for and tabulating stored data, this is performed by accessing such a data table.
In addition to the method of storing the value of the fields as field values as is, there is also a known method of converting the values to codes and storing the codes as field values. For example, with respect to the sex field, the value “Male” may be converted to “0” while the value “Female” is converted to “1” and then the values “0” or “1” are stored as the field values instead of “Male” or “Female.” Even in this case, there is no change to the point that the converted codes are stored in a data table as field values.
In the case of searching for and tabulating large amounts of data stored using a data structure of the data table type in the aforementioned prior art, there is a problem in that the processing time for searching and tabulating becomes longer due to the access time required to access such data tables.
In addition, data tables have at least the following intrinsic drawbacks.    (1) The data tables easily become enormous in size and cannot be easily separated (physically) into individual fields. For example, when extracting records in which the sex is “Male,” the age and occupation information is unnecessary, so efficiency could be improved if the table could be separated into a table containing only the sex fields. In the case of the field-sequential storage technique shown in FIG. 2B, while separation into individual fields is simple, when large amounts of data are handled, the size of the data table still becomes enormous, so the actual expansion of a data table into memory or other fast storage device for the purpose of tabulating or searching is difficult.    (2) Data tables cannot be kept in a form with multiple field values sorted simultaneously. For example, in the case of the prior art illustrated in FIG. 2A and FIG. 2B, the field values for the sex field arc arranged in record number order in the manner “Female, Male, Female, . . . , Female.” However, when performing searching and tabulating processes, it is typically convenient for them to arrange in the manner “Female, Female, Female, Male, . . . , Male.” However, in table data, the field values are arranged in a specific matrix order, namely record number order, so sorting the field values on a specific field is not permitted. For this reason, in the case of the prior art, it is not possible to select an arrangement of field values that is convenient for searching and tabulating.    (3) In a data table, identical values appear over and over. For example, in the case of the conventional data table given in FIG. 2A and FIG. 2B, at the time of extracting records wherein the sex is “‘Male’ or ‘Man’” (or namely, record numbers), because the field value “Male” appears many times, it is necessary to perform the matching operation “‘Male’ or ‘Man’” which is the comparison condition with the field value of “Male” many times. A single comparison should be sufficient to make the determination of whether there is a match with identical values.
In order to increase greatly the speed of searching for and tabulating large amounts of data, the object of the present invention is to provide a method of searching for, tabulating and sorting table-format data and an apparatus for implementing said method by providing a data control mechanism that both has the functions of the conventional data table and solves the aforementioned problems with the data structure based on the data table.