The invention relates to a structured data processor that processes tree-structured data.
In a method of representing structured data constituting a tree structure, data elements are arranged in the order of depth and the structural position information of each data element is represented in terms of a numeric string. In generally known method, a data element is represented in the form of a numeric string with a single numeral assigned to a single depth, i.e., brotherhood and with numerals arranged in the order of depth. It is each numeral in the numeric string that represents the sequence in each depth. In this representing method, the position information of a data element whose depth is large has a larger number of numerals in the numeric string. A specific example is an "object identifier" indicated in ISO8613ODA (Open Document Architecture).
FIGS. 11(a) and 11(b) are diagrams illustrative of a conventional method of representing structured data. FIG. 11(b) shows an example of data of a tree structure. A data element of the tree structure is indicated by ".smallcircle.". This tree structure has below the root-level data element three child-level data elements, and below the leftmost child-level data element two grandchild-level data elements, below the rightmost child-level data element three grandchild-level data elements. Sequential data as shown in FIG. 11(a) is an example of methods of representing such tree structure. In this example, the data in parentheses "[ ]" within the sequential data represents a data element. A numeric string partitioned by a first symbol "/" out of the data in parentheses "[ ]" indicates structural position information; the next symbol "@" indicates attribute information such as an identifier appended to the data element in the structured data. In the case of a structured document, "@" includes the corresponding sentence data and the like. The last character string indicates a name of the data element, which may be omitted. The data representing the data element may have either a fixed length or a variable length.
A numeric string represents structural position information so that each numeral represents position information on each depth level. A numeric string represents a depth with respect to the root in accordance with an order. For example, the first numeral represents the depth of the root, and the second numeral represents the depth of a child of the root. That is, the number of numerals in a numeric string represents the depth. The order is supposed to start with 0. Each numeral in the numeric string represents the sequence of brotherhood data elements. This is how a structural position is determined from position information. For example, a numeric string "0/2/1" that is the position information of "[0/2/1@GRANDCHILD1]" out of such sequential data as shown in FIG. 11(a) indicates that a data element "GRANDCHILD 1" is located at the first position on a grandchild level, i.e., below a second child-level data element "CHILD 2", i.e., below the 0th root-level data element "ROOT 0".
By apparently handling the data of the tree structure shown in FIG. 11(b) as such sequential data as shown in FIG. 11(a), such data can be shared with an external device or devices that cannot handle structured data of a tree structure as it is. For example, as disclosed in Japanese Patent Unexamined Publication No. 4-84342, data of a tree structure can be stored using an ordinary data management system that cannot handle data of a tree structure.
FIG. 10 is a block diagram of a conventional structured data processor. In FIG. 10, reference numeral 1 designates a structured data processor; 2, a data input unit; 3, a structured data treating unit; 4, an editing unit; 5, a retrieving unit; 6, a display unit; 7, an input unit; 8, a data output unit; and 10, an external device. The structured data processor 1 includes the data input unit 2, the structured data treating unit 3, the display unit 6, the input unit 7, and the data output unit 8. Sequential data as shown in FIG. 11(a) is inputted as structured data from the external device 10 such as a data storage device or a data transmission device; the inputted data is treated into structured data of such a tree structure as shown in FIG. 11(b); and the treated data is outputted as sequential data as shown in FIG. 11(a). By apparently inputting and outputting the data in the form of sequential data in this way, the structured data can be utilized by external devices not capable of directly handling structured data.
The data input unit 2 receives such sequential data as shown in FIG. 11(a) from the external device 10, converts the received data into structured data of such tree structure as shown in FIG. 11(b), and outputs the converted data to the structured data treating unit 3. Similarly, the data output unit 8 receives such structured data as shown in FIG. 11(b) from the structured data treating unit 3, converts the received data into such sequential data as shown in FIG. 11(a), and outputs the converted data to the external device 10.
The structured data treating unit 3 includes the editing unit 4, the retrieving unit 5, and other structured data processing units, displays information on the display unit 6, and processes input data from the input unit 7, etc.
The operation of the structured data processor 1 can be divided roughly into three parts: a data input operation, a data treating operation, and a data output operation. These three operations will be described in turn.
In the data input operation, a data element of structured data arranged in the order of depth is inputted to the data input unit 2 in the form of sequential data from the external device 10 that is not capable of handling the structured data as it is, the external device 10 being, e.g., a data transmission device or a data storage device. The structural position of the data element is uniquely determined from the position information of the data represented in the form of a variable-length numeric string, and then a data element of structured data is generated from data inputted thereafter, e.g., from data following "@" shown in FIG. 11(a), and the generated data element is built up in the position of the previous structured data.
In the data treating operation, editing, printing, and the like are effected taking advantage of any unit incorporated into the structured data treating unit 3 other than the input and output units. One or more units may be additionally provided to implement a special treatment as the case may so require. For example, in editing, subject data displayed on the display unit 6 is selected by the input unit 7, and subjected to move, deletion, copy, or the like at the editing unit 4. Editing comes in two operations: a structural operation for structured data and a content operation for structured data contents. A structural editing unit is required for structural editing, and a content editing unit is required for structured data content editing. For example, to retrieve a data element within a structure, the retrieving unit 5 is required.
In the data output operation, structured data is outputted in the form of sequential data to the external device 10 that cannot handle structured data as it is, such as the data transmission device and the data storage device. That is, the data output unit 8 converts position information included in the data into a variable-length numeric string, gives such converted data to a data element, converts the data element of the structured data into sequential data that can be outputted in the order of depth, and outputs such sequential data to the external device 10.
A procedure for outputting data elements in the order of depth is generally known. First, the position information of a data element that constitutes the root of a tree structure is formed into a numeric string consisting of a single numeral "0". Once the position information and attribute information of the root have been outputted, the position information of a child is prepared and outputted if any. If another child is present, the above operation is repeated recursively. Since the position information of a child has a depth deeper than that of the last data element, a numeral "0" is added to the position information of the child. If there are many children, the last numeral that has been added is sequentially incremented, numbering them from "0" in the ascending order. For example, the children of a data element "0/2" are "0/2/0", "0/2/1" and so on.
In the case of using the variable-length data that is dependent on the structural depth of a data element having the thus described position information, data indicating the number of numerals or data indicating a partition between numeric strings is necessary. Further, a disadvantage that the data volume increases dependent on the structural depth must also be taken care of. This disadvantage is encountered in any data processing including transmission, comparison, storage, and the like. Generally, data transmission with an external device takes more time than that of data transmission within the system. In addition, the data transmission time is substantially proportional to the data volume. This means that a large data transmission volume entails much time for transmission. Further, a large data volume entails a large memory. As a result, a volume of data that can be processed at a time becomes small. Still further, the data volume becomes large incommensurate with the information volume, thus not allowing much information to be stored in the data storage device. An expensive, large-capacity data storage device is hence used for data storage. These are disadvantages associated with the conventional system.
Let us think about the data volume with a simple example. Let a bifurcated tree structure of depth 10 with the depth of the root being 0 be represented. It is supposed that the size of a single numeral is equal to 1 byte and that a data end is counted as 1 byte for convenience. The data volume necessary for the position information of the root is at most 2 bytes including the data end. The position information of a bottommost data element, i.e., a data element whose depth is 10 has 11 numerals, thus requiring 12 bytes including the data end. Since the number of bottommost data elements is 2.sup.10 =1024=1 K data elements, the capacity necessary for all the bottommost data elements is 12 bytes.times.1 K data elements=12 kilobytes. Similarly, a calculation of the position information necessary for data elements of another depth indicates that 22 kilobytes is necessary to represent a structure having 2 K data elements in a bifurcated tree whose depth is 10.
Similarly, a bifurcated tree of depth 20 requires 42 megabytes to represent a structure having 2 M data elements. Since the position information of each of all the data elements starts with "0", which is self-explanatory, such heading numeral "0" may be omitted. Even with such omission, a bifurcated tree of depth 10 requires 20 kilobytes, and a bifurcated tree of depth 20, 40 megabytes.
A representation of a single numeral using 1 byte can accommodate only 255 brothers. This is a tremendous limit in dealing with a large volume of data of the same format such as a hierarchical database. If a single numeral is represented with 2 bytes, then a bifurcated tree of depth 10 requires 40 kilobytes, and a bifurcated tree of depth 20 requires 80 megabytes.
Data compression technology of recent development has achieved a compression ratio of 1/2 or about 1/10 at the maximum. Even with such data compression technology, a bifurcated tree of depth 10 requires 4 kilobytes, and a bifurcated tree of depth 20 requires 8 megabytes. In addition, incorporation of such a data compression/development device is costly in terms of a system as a whole.