In this day and age when computers are used in every corner of society and networks such as the Internet are widely used, the storing and processing of large-scale data has become common occurrence.
Conventionally, data structures suitable for high-speed processing of the large-scale data have been proposed. For example, the inventor has proposed a data management mechanism suitable for searching, summarizing and sorting of large-scale data or particularly table data at a high speed (See Patent Document 1). This data management mechanism uses an information block for representing the individual field values of a field included in the table data. In this information block, field values belonging to a field of the table data are represented by field value sequence numbers assigned to the respective field values and an array of the actual field values arranged in the order of field value sequence numbers. An array is provided in which the field value sequence numbers corresponding to the field values of respective records are arranged in the order of the record numbers. The field value of a given record is identified by finding the value corresponding to the field value sequence number of this given record in the array of the field values. Also, a record to be processed in table data is identified by use of an array in which record numbers are arranged in sequence.
The information block is a table in which field values corresponding to field value sequence numbers are stored in the order of field value sequence numbers with respect to each field of the table data, wherein the field value sequence numbers represent the sequencing of field values (i.e., assigning integers to field values) belonging to a given field. The field values may be any type of data such as numerical values (integer, fixed point, floating point, or the like) or character strings. This data management mechanism has an advantage in that a value of any data type can be treated as an integer that is a field value sequence number. Namely, when character-string data are to be sorted according to this data management mechanism, for example, character-string data are not the actual sort objects subjected to sorting, but field value sequence numbers corresponding to the values of character-string data are the actual sort objects that are to be sorted. In so doing, the results of sorting are represented by an array in which record numbers are arranged in sequence.
In order to execute a huge amount of calculation required for the processing of the large-scale data at a high speed, introduction of parallel processing has been attempted. In general the parallel processing architecture is roughly classified into “distributed memory type” and “shared memory type”.
In the distributed memory type, each processor has its own local memory, and a system is constructed by connecting them. With this way, a hardware system incorporating several hundreds to several tens of thousands of processors can be designed theoretically. However, the distributed memory type has technical problems such as the complexity distributed data management and the low efficiency in communication between processors. The inventor has proposed a distributed-memory type computer architecture that solves these technical problems and is capable of parallel processing of large-scale table data (See Patent Document 2). The information processing system described in Patent Document 2 is provided with a plurality of memory modules having a memory and a controller and a data transmission path for connecting the memory modules to each other and transmitting a value contained in one of the memory modules to another memory module, in which the memory of each memory module is configured to hold a list of values sequenced in ascending order or descending order without duplication. The controller of each memory module includes means for transmitting a value included in the value list to another memory module and means for receiving a value included in the value list from another memory module. By this arrangement, the controller of each memory module can obtain a global order of the value of its own value list, considering the value in the value lists in other memory modules by comparing the value of its own value list with the value list of another memory module.
On the other hand, the shared-memory type is a way in which a single large memory space is shared by a plurality of processors. In this way, traffic between the processor group and the shared memory makes a bottleneck, and it is considered difficult to construct a system using more than a hundred processors in actuality. However, under these circumstances, personal computers configured as a shared-memory multiprocessor system using a plurality of CPU are available in recent years. A standard CPU used in this type of personal computer operates with an internal clock of 5-6 times that of a memory bus and is internally provided with an automatic parallel execution function or a pipeline processing function so that approximately single data can be processed by a single clock (memory bus).
Therefore, processing of large-scale data at a speed higher than that of a single processor system by using the personal computer configured as the shared-memory multiprocessor system is desired.
Patent Document 1: International Publication No. WO00/10103
Patent Document 2: International Publication No. WO2005/041066