With enterprises and local governments holding large amounts of data such as business logs and sensor data, there is need for techniques for extracting useful information from such large amounts of data. With advanced IT technology being available, they have been collecting larger and larger amounts of data. Data actually collected is, like XML (Extensible Markup Language) data, hierarchized. Hence, techniques which make it possible to perform complicated data analysis processing with a large amount of hierarchical data at high speed are in demand.
JP-A-2003-162545 discloses a technique for searching for and extracting required data at high speed from a tree-structured CSV (Comma Separated Values) file using an index file which stores information on head positions of data. In the technique, however, even when data only in plural specific types of fields is required, it is necessary to read data in all fields.
Techniques for high-speed cross-tabulation in an optional dimension in which a special data storing method for cross-tabulation is used are disclosed in JP-A-2002-197099 and JP-A-2001-22766. These techniques are, however, specialized in cross-tabulation, so that the techniques cannot be used to perform, at high-speed, complicated data analysis processing other than cross-tabulation. The techniques cannot be used to process hierarchical data at high speed, either.
JP-A-2001-43237 discloses a technique in which records with a specific field having a specific value are retrieved at high speed using an index indicating where in the file different attribute values are located. The technique, however, enables high-speed processing only for the purpose of record retrieval performed in cases where the number of values which can be held by fields of each record is small. Particularly, the technique does not enable high speed processing for retrieving the values of a specific field of all records included in the file. The technique cannot be used to perform high-speed processing with hierarchical data, either.
JP-A-Hei11(1999)-154155 and JP-A-2001-22617 disclose techniques for storing field data sequentially in a file and performing high-speed processing for retrieving specific field data only. The technique, however, does not enable high-speed processing with hierarchical data.