1 . Field of the Invention
The invention relates to a data processing method and data processing apparatus for processing large amounts of data using a computer or other information processing apparatus, and particularly to a method and apparatus for concatenating a plurality of tables of table-format data in a relational database, and for searching for, tabulating and sorting field values of the desired records or the like.
2. Description of the Prior Art
Databases are used for various apps, but the use of a relational database (RDB) which is able to eliminate logical contradictions has become the mainstream in medium- to large-scale systems. For example, an RDB may be used in an airline seat reservation system or the like. In this case, a key field may be specified to perform a quick search for targets (often a single target), or to confirm, cancel or change reservations. In addition, because the number of seats on each flight numbers several hundred at most, it is possible to find the number of empty seats on a specific airline flight.
However, when one attempts to use this RDB to perform specific calculations (e.g., calculation of the seat vacancy rate) for each fiscal year, each day of the week, each month, each route, each time zone or each type of airplane, this is known to take an extremely large amount of time. To wit, while the RDB is superior at performing processing without contradictions, on the other hand, it has poor performance in searching, tabulating or sorting on a considerable number of records.
For this reason, in recent years it has become typical to construct in the system a type of database called a data warehouse (DWH) in addition to the RDB, for the purpose of searching and tabulating. To wit, an extremely large-scale database equipped with specific data formats and data field names to match the specific purpose of the end user is constructed, and then the end user can use this to perform specific types of searches and tabulation.
However, providing a DWH in addition to the RDB, or namely providing a plurality of databases becomes estranged from the ideal form of the database used for centralized control of data, and particularly the ideal form of the RDB concept. This may give rise to the following various problems, for example.
(1) The DWH is fixed in format, so searching and tabulation on fields other than those provided in advance in the DWH are difficult.
(2) By providing a fixed-format DWH in addition to the RDB, the data size becomes extremely large, so it cannot handle RDB updates and the like.
The present invention has as its object to be able to perform quickly the joining of a plurality of tables of table-format data as desired, and also provide a structure for table-format data with a small data size, a concatenation method therefore, and a method for performing the extremely rapid display of concatenated table-format data.
The object of the present invention is achieved by providing a method of concatenating a plurality of tables of table-format data where each table is represented by an array of records containing a field and the field values contained therein, wherein said method is characterized in comprising the steps of: constructing each table of table-format data in a manner such that each table is divided into one or more information blocks consisting of: a value list in which the field values are stored in the order of a field value number corresponding to the field value belonging to a specified field, and a pointer array in which pointer values for pointing to said field value numbers are stored in a unique record order, finding equivalent fields among a plurality of tables of table-format data, identifying the information blocks for said equivalent fields, in each of said plurality of tables of table-format data, comparing the value lists contained in said identified information blocks, and setting both value lists to the same values, at the time of setting said value lists to the same values, adding pointer values to associated pointer arrays in the information block to which that field value is added, and by making the value lists contained in the information blocks for specific fields in said plurality of tables of table-format data equivalent, concatenating the table-format data.
By means of the present invention, value lists containing actual field values and pointer arrays that contain pointer values for specification the field values of said value lists are used to constitute an information block regarding a certain field, so table-format data is represented as a set of information blocks pertaining to various fields. Accordingly, when concatenating (namely, joining) a plurality of tables of table-format data, value lists within the information blocks among table-format data are found and the field values of the value list are set to the same values, and in response the associated pointer arrays are changed. Accordingly, it is possible to add the field values of a value list and add the accompanying pointer values (namely, share the value lists) without requiring any complicated processing, and thus two tables of table-format data can be concatenated.
In a preferred embodiment of the present invention, regarding information blocks containing value lists that have been made equivalent, only a single value list is actually saved. Namely, regarding the shared value lists, it is sufficient to save only one. Thereby, it is possible to reduce the memory size required. In addition, there is no massive processing required for joining, so the concatenation (joining) of table-format data can be implemented at very high speed.
The object of the present invention can also be achieved by a method of presenting concatenated table-format data characterized in comprising the steps of: preparing a plurality of tables of table-format data in which the value lists contained in information blocks for specific fields were made equivalent by means of the aforementioned concatenation method, regarding said plurality of tables of table-format data, among said information blocks for specific fields, identifying information blocks related to key fields in which the pointer values of the pointer array are not duplicated, and determining the table-format data containing said information blocks to be sub table-format data, in one of the information blocks, generating a second pointer array that identifies the record numbers of said sub table-format data in the order of the field values of the field list, among the information blocks contained in said plurality of tables of table-format data, identifying the information blocks related to the fields to be presented, among said information blocks related to fields to be presented, regarding information blocks that constitute the main table-format data which is the table-format data other than said sub table-format data, looking up pointer values within the pointer array corresponding to a stipulated record number and obtaining a stipulated field value, among said information blocks related to fields to be presented, regarding information blocks that constitute said sub table-format data, looking up record numbers corresponding to a stipulated record number and obtaining a record number regarding the sub table-format data within the corresponding said second pointer array, in the information block constituting said sub table-format data, looking up a pointer value within the pointer array corresponding to the record number regarding said sub table-format data, and obtaining a stipulated field value, and presenting the field value thus obtained.
By means of the present invention, regarding a plurality of tables of table-format data, when a user selects a specific field and requests its presentation, the plurality of tables of table-format data are concatenated and in the sub table-format data, a second pointer array that can identify the record numbers in the sub table-format data from the record numbers of the main table-format data (namely, reverse lookup is possible). Regarding the main table-format data, pointer values within the pointer arrays can be identified from the record numbers of the main table-format data, and moreover, the field values specified by said pointer values can be identified, so the desired field value can be found. On the other hand, regarding the sub table-format data, the record number of the sub table-format data can be identified from the record numbers of the main table-format data, and next, the pointer values within the pointer array and said pointer arrays can sequentially identify the specified field values, so the desired field value can be found. Accordingly, it is possible to select the desired field from the plurality of tables of table-format data and generate a joined table (view) at a high speed.
In order to identify the record number of said sub table-format data, in the information block related to said key field, it is sufficient to generate a second pointer array containing pointer values for specifying record numbers in the order of the field values of the field list contained in said information block. In this case, among the information blocks related to the fields to be presented, in information blocks that constitute the sub table-format data, looking up a pointer value within the pointer array corresponding to said stipulated record number, identifying the record number regarding the sub table-format data within the corresponding second pointer array, and in information blocks that constitute said sub table-format data, looking up a pointer value within the pointer array corresponding to said record number within said second pointer array, is sufficient to obtain a stipulated field value. This technique is described more specifically in Embodiment 1.
Alternately, in an information block that constitutes said main table-format data and that is an information block wherein its value list was made equivalent, generating a second pointer array containing pointer values for specifying record numbers of said sub table-format data in the order of the field values of the field list, identifying a record number regarding sub table-format data within said second pointer array corresponding to said stipulated record number, among said information blocks related to the fields to be presented, in information blocks that constitute said sub table-format data, looking up a pointer value within the pointer array corresponding to the record number regarding said sub table-format data, may be performed to obtain a stipulated field value (see Embodiment 2), or among the information blocks that constitute said sub table-format data, in at least the information block related to the field to be presented, generating a second pointer array containing pointer values for specifying record numbers of said sub table-format data in the order of the field values of the field list, and among the information blocks related to the fields to be presented, in information blocks that constitute the sub table-format data, looking up a pointer value within the pointer array corresponding to said stipulated record number, identifying the record number regarding the sub table-format data within the corresponding second pointer array, and in information blocks that constitute said sub table-format data, looking up a pointer value within the pointer array corresponding to said record number within said second pointer array may be performed to obtain a stipulated field value (see Embodiment 3).
Another embodiment of the present invention further comprises: in information blocks in which the field values are to be sorted according to a stipulated order, generating a count that indicates the number of records related to the main table-format data in a count array corresponding to the field value, generating a position indicating array that indicates the initial value of the position at which the record numbers regarding said main table-format data are stored according to said count array, placing the record numbers of said main table-format data according to the position indicating array at the position indicated by the corresponding pointer value, and also, incrementing the value corresponding to said position indicating array, thereby generating a sort array in which the record numbers of the main table-format data are sorted and stored, and obtaining the required field vales in the order of record numbers stored in said sort array, and presenting the field values sorted based on said key field.
For example, in the case in which the field on which to sort is the key field, it is sufficient to, in the information block regarding the key field, generate a count array that stores a count that indicates the number of pointer values within a pointer array of an information block that constitutes said main table-format data and that is an information block wherein its value list was made equivalent to said information block, in the order of the value list within the information block for said key field. In other cases, it is sufficient to, in information blocks in which said field values are to be sorted using a pointer array within the information block that constitutes the main table-format data equivalent to the information block regarding the key field, and said second pointer array, generate a count array that stores a count that indicates the number of records regarding main table-format data.
In another embodiment of the present invention, preparing a plurality of tables of table-format data in which the value lists contained in information blocks for specific fields were made equivalent by means of the method of presented concatenated table-format data, and regarding said plurality of tables of table-format data, among said information blocks for specific fields, determining the table-format data in which the default sort order at the time of presentation is reflected to be master table-format data, and determining all other table-format data to be slave table-format data, in an information block that constitutes said slave table-format data and that is an information block wherein its value list was made equivalent, generating a first count array that stores a count that indicates the number of records regarding the slave table-format data corresponding to the field value, according to said first count array, generating a first position indicating array that determines the initial position for placement of said slave table-format data in the state when the record numbers are sorted, placing the record numbers of said slave table-format data according to the first position indicating array at the position indicated by the corresponding pointer value, and also, incrementing the value corresponding to said position indicating array, thereby generating a first sort array in which the record numbers of the main table-format data are sorted and stored, and looking up the initial value and final value of said position indicating array, and the pointer array within the information block wherein its value list was made equivalent regarding said master table-format data, detecting the degree of duplication of the pointer array of the other information block regarding said master table-format data, and expanding the pointer array according to said degree of duplication, looking up the initial value and final value of said position indicating array, and said sort array, detecting the degree of duplication of the pointer array of the information block regarding said master table-format data, and expanding the pointer array according to said degree of duplication, can be performed to obtain and present the required field value based on said expanded pointer array.
This embodiment can be applied to the case in which a key field cannot be found regarding the table-format data. In this case, determine the table-format data in which the default sort order at the time of presentation is reflected to be master table-format data, and determine all other table-format data to be slave table-format data. By means of this embodiment, this is expanded according to the degree of duplication of the pointer array and the field value is identified according to the expanded pointer array. Accordingly, even in the case of joining table-format data in which a certain field value is used in duplicate, it is possible to manipulate only the sort array and pointer array to create appropriate tables (views) without requiring any complicated processing.
In order to reduce the memory size of the main table-format data, it is sufficient to generate a first conversion array wherein the record numbers of the master table-format data are duplicated based on said degree of duplication and placed, and regarding said master table-format data, look up the array of pointers to the value list of the information block according to said first conversion array, and fetch the field value of the list. In addition, in order to reduce the memory size of the slave table-format data, it is sufficient to generate a second conversion array wherein the record numbers of said master table-format data, and the record numbers of the slave table-format data are duplicated based on the associated degree of duplication and placed, and regarding said slave table-format data, look up the array of pointers to the value list of the information block according to said second conversion array, and fetch the field value of the list (see Embodiment 5).
Moreover, in another embodiment of the present invention, a plurality of tables of table-format data are prepared in which the value lists contained in information blocks for two or more specific fields were made equivalent by means of the method of presenting concatenated table-format data, and regarding said plurality of tables of table-format data, among said information blocks for specific fields, by determining the table-format data in which the default sort order at the time of presentation is reflected to be master table-format data, and determining all other table-format data to be slave table-format data, regarding said master table-format data, generating an array of pointers to a virtual value list which is a product set of the two or more value lists that were made equivalent, regarding said slave table-format data, generating a second array of pointers to said virtual value list, generating a third pointer array that identifies the record number of said slave table-format data in the order of the field values of said virtual value list, among the information blocks contained in said plurality of tables of table-format data, identifying those information blocks regarding fields to be presented, among said information blocks regarding fields to be presented, regarding the information blocks that constitute table-format data, looking up the pointer value within the pointer array corresponding to a stipulated record number, obtaining a stipulated field value, among said information blocks regarding fields to be presented, regarding the information blocks that constitute said sub table-format data, looking up the record number corresponding to a stipulated record number, and identifying the record number of said slave table-format data within said third pointer array based on the corresponding pointer value within the array of pointers to said virtual value list, in said information block constituting said slave table-format data, looking up the pointer value within the pointer array corresponding to the record number regarding said slave table-format data, and obtaining a stipulated field value, it is possible to present the field value thus obtained.
This embodiment is applicable to the case that requires the joining a plurality of fields in a plurality of tables of table-format data, and finding a stipulated table (view). By means of this embodiment, the pointer array is created with respect to the value list which is a product set of the field values of a plurality of fields. Accordingly, there is no need to actually create a value list which is a product set expected to occupy an enormous capacity, but rather it is possible to obtain a table (view) in the state with a plurality of fields joined at extremely high speed by merely generating a pointer array.
In the event that there are two of said information blocks that have value lists that were made equivalent, and p is the number of field values in the value list that was made equivalent regarding one information block, while q is the number of field values in the value list that was made equivalent regarding the other information block,
the pointer value Pmi (0xe2x89xa6ixe2x89xa6pxe2x88x921) to said virtual value list regarding said master table-format data is expressed as follows:
Pmi=Pm1i*q+Pm2i
(where Pm1i is the field value of the value list regarding one of the information blocks, and Pm2i is the field value of the value list regarding the other information block), and
the pointer value Psj (0xe2x89xa6jxe2x89xa6pxe2x88x921) to said virtual value list regarding said slave table-format data is expressed as follows:
Psj=Ps1j*q+Ps2j
(where Ps1j is the field value of the value list regarding one of the information blocks, and Ps2j is the field value of the value list regarding the other information block)
Still another method of joining a plurality of fields in a plurality of tables of table-format data to obtain the field values of the desired field is by preparing a plurality of tables of table-format data in which the value lists contained in information blocks for two or more specific fields, and regarding said plurality of tables of table-format data, among said information blocks for specific fields, determining the table-format data in which the default sort order at the time of presentation is reflected to be master table-format data, and determining all other table-format data to be slave table-format data, regarding said master table-format data and master table-format records respectively, generating a first sort array by sorting said record numbers on a field other than the field in which the default sort order is reflected, and finally sorting said record numbers on the field in which said sort order is reflected, looking up the record numbers within said first sort array, and fetching the respective corresponding field values of the two or more value lists regarding said two or more fields, storing the field values thus fetched in a multidimensional array at positions corresponding to a multidimensional list containing field values consisting of multidimensional arrays of two or more field values, storing said record numbers in positions corresponding to said record numbers in the pointer arrays for identifying the multidimensional arrays of said multidimensional value list, in one of the information blocks, generating a second pointer array that identifies the record numbers of said slave table-format data in the order of the field value of the value list, among said information blocks regarding fields to be presented, regarding the information blocks that constitute said master table-format data, looking up the pointer value of a pointer array for identifying multidimensional value lists corresponding to a stipulated record number and/or pointer values of other pointer arrays, and obtaining a stipulated field value, among said information blocks regarding fields to be presented, regarding the information blocks that constitute said slave table-format data, looking up the record number corresponding to said stipulated record number, and identifying the record number regarding the slave table-format data within said corresponding second pointer array, in said information blocks that constitute said sub table-format data, looking up the pointer value of a pointer array for identifying multidimensional value lists corresponding to a record number regarding said slave table-format data, and/or pointer values within pointer arrays, and obtaining a stipulated field value, and thus presenting the field value thus obtained.
By means of this embodiment, there is no need to provide the pointer array to the virtual value list, so it is possible to reduce the required memory size even further.
In addition, the object of the present invention may also be achieved by a recording medium recorded with a program that can implement the aforementioned methods, or a table-format data concatenation apparatus or a table-format data presentation apparatus consisting of means that implement the steps of the aforementioned methods.