A structured document is a document involving document elements and logical relationships (document logical structures) between the elements. Structured documents include SGML (Standard Generalized Markup Language) documents and XML (eXtensible Markup Language) documents, for example. SGML is a standard specified by ISO (International Standard Organization), and XML is a standard specified by W3C (World Wide Web Consortium). XML was established so that some functions of SGML can be inherited therein while addressing operational problems of HTML (Hyper Text Markup Language) which is substantially a standard document format for the Internet.
Document conversion and processing is important in such structured documents. For example, PDAs (Personal Digital Assistants) and mobile phones are installed with a web browser recently, but these small-sized mobile terminals are generally equipped with a screen with a limited display area and low-speed communication means. Accordingly, HTML documents for desktop PCs are processed so that only the contents suitable for a small-sized display can be extracted therefrom by means of an XML-compliant annotation language. Furthermore, it is strongly desired to display contents described in XML on an HTML-compliant browser and to easily convert differently formatted data among companies. Accordingly, conversion of contents described in XML into an HTML- or PDF-compliant display format or transformation of XML documents is performed using XSLT (XSL Transformations). XSL (eXtensible Stylesheet Language), a stylesheet language for XML documents, is composed of structural conversion of an XML document to be formatted and vocabulary for describing the meaning of the format to be obtained. The above-mentioned XSLT is the technology for realizing the structural conversion part.
In such conversion/processing of a structured document, an element to be converted/processed is specified by a structure pattern. The structure pattern is an expression pointing to an element in a document logical structure of a structured document, and the term “structure pattern” used herein means an expression by a string of hierarchy specifying items, each of which is composed of a hierarchy specifier specifying the hierarchy/hierarchy group in a targeted structure and an element pattern specifying an element/element set to be selected in the hierarchy. The hierarchy specifying item is hereinafter referred to simply as an “item”. XML structure patterns include, for example, a pattern in XPath (in XML Path Language) specified by W3C (World Wide Web Consortium). In an XPath, an item is referred to as a location step. XPath will be described later in more detail as a particular example of a structure pattern.
Specification of a target to be processed by means of a structure pattern, however, involves the following problem. The structure pattern may not point to the original element when the structured document in question is changed. Accordingly, it is necessary to change the structure pattern as the original structured document is changed. However, such a maintenance work must be performed manually, thus requiring a lot of labor. If the structured document in question is an HTML document accessed via the Internet, the contents of which are changed day by day, the problem will be significantly influential.
One existing technology is a method for giving a unique identifier to each element to be specified. For example, in Amaya, a Web authoring tool developed by W3C (World Wide Web Consortium), an element is specified by means of an ID attribute (see I. Vatton et al., “Annotations in Amaya,” December 2002). In this specification of an element by means of an ID attribute, change made in the document does not influence the element unless the element itself is deleted. However, specifying an element by means of an ID requires an editing cost and is not a practical solution.
Another existing technology is a method of adapting a structure pattern to any change made in the document after the change (see T. A. Phelps et al., “Robust Intra-document Locations,” 9th World Wide Web Conference, 2000). In this method, the original element is searched along the hierarchical structure of the document with its name as a clue and in conformity with a policy. However, there is a problem how to determine the predetermined policy, that is, a predetermined policy for specifying a search range or a search order.
There may be multiple structure patterns pointing to a particular element in a structured document. This will now be described below with an example. FIG. 15(a) shows a hierarchical structure of a structured document in a tree form. The document order, that is, the appearance order in the document is an element R130, element A131, element B132, element C133 and element D134. One of structure patterns pointing to the element D134 shown in FIG. 15(a) is such that searches sequentially from the parent to a child, then to another child, and so on, that is, a structure pattern expressed as a “child element named D of a child element named B of a child element named A of an element named R”. In the XPath notation to be described later, this can be indicated as “/child::R[1]/child::A[1]/child::B[1]/child::D[1]” (first structure pattern). In another structure pattern pointing to the element D134, it can be specified directly from the element R130. In this case, the element D134 is a descendant of the element R130 and can be expressed as a “grandchild element named D of an element named R”. In the XPath notation, this can be indicated as “/child::R[1]/descendant::D[1]” (second structure pattern). Similarly, when skipping only the element B132, it can be expressed as a “grandchild element named D of a child element named A of an element named R”. In the XPath notation, this can be indicated as “/child::R[1]/child::A[1]/descendant::D[1]” (third structure pattern).
Though the above-mentioned three structure patterns all point to the element D134, the latter two structure patterns have durability against the change in the document described above. Suppose, for example, the element B132 is deleted to change the document, and as a result, the element C133 and the element D134 become child elements of the element A131 (see FIG. 15(b)). In this case, the first structure pattern points to an element that does not exist in the document. On the contrary, the latter two structure patterns still continue to point to the element D134. Furthermore, the latter two structure patterns are different in their durability. For example, an element D135 having the same name as the element D134 is added just below the element R130 so that it appears prior to the element A131 (see FIG. 15(c)). In this case, the second structure pattern, which is to point to the first grandchild element named D of the element R130, points to the newly added element D135. On the contrary, the third structure pattern, which is to point to the first grandchild element named D of the element A131 of the element R130, continues to point to the original element D134 correctly.
In this way, the above-mentioned problem caused by change made in a document can be solved by using a durable structure pattern. A durable structure pattern, however, is not so simple as a structure pattern searching sequentially from a parent to a child (hereinafter referred to as a “fixed path”) and is difficult to create. Furthermore, there are many kinds of durable structure patterns, and thus it is difficult to select a structure pattern most suitable for possible future changes made in a document.
In spite of the situation described above, there is no editing environment for creating a durable structure pattern provided by the existing technology. There are XSLT editing systems including, for example, “eXcelon Stylus” by eXcelon Corporation, “XML Spy” by Altova Corporation, “IBM XSL Editor” by IBM Corporation, etc. Though these editing environments provide a function of automatically generating an XPath, the generated XPath is limited only to a simple, fixed path searching sequentially from a parent to a child. Accordingly, in order to generate a durable structure pattern, a user must edit the fixed path by directly inputting character strings or by utilizing an auxiliary tool selected through a menu. It is thus difficult to generate many kinds of complicated structure patterns. Furthermore, the user is required to have detailed knowledge about structure patterns.
In one existing technology, a user gives an example of a search result and then a structure pattern is automatically created which includes the partial structure in a structured document obtained from the user's example, as the search result (see Published Unexamined Japanese Patent Application No. 7-225771). The technology, however, determines whether or not the structure pattern is correct only based on whether or not the partial structure of a structured document obtained from the user's example is included therein, and it does not positively support creation of a structure pattern with durability. This is because the technology aims only at enabling an intended structure pattern to be easily obtained even by a user without knowledge of the internal structure of structured documents or the grammar of structure patterns. Consequently, the automatically created structure pattern is not always a structure pattern with durability. Furthermore, the user is still required to have detailed knowledge about structure patterns in order to know whether or not the automatically created structure pattern has durability.
Depending on the contents of a structured document, it may be possible to predict a part which may be changed in the future to some extent. Thus, if it is possible to specify the part of the structure pattern that is predicted to be changed in the future as an item desired to be edited and complement the part with a durable expression, then a user will be able to quickly obtain only a structure pattern that is suitable for his purpose from many kinds of structure patterns.
In this respect, in a user interface (such as a shell) handling a UNIX® file system having a tree structure, a function of complementing a file path is provided. In the UNIX® file path, each file hierarchy is separated by “/”, and each directory and file is shown as a character string. Under an environment using a bash (Bourne Again SHell), by pressing the Tab key in succession after inputting “1s/home/user”, for example, directories and files below “/home/” are shown as “/home/user1”, “home/user2”, and so on, with the file paths complemented automatically. The function of complementing a file path, however, complements a file path from the top toward the end, and the user must search all the hierarchies from the hierarchy, which is base point in the path, to the hierarchy where the desired information exists. It is impossible to specify a particular hierarchy in the path and automatically complement only the hierarchy.
Thus, there is a demand for realization of a system by which many kinds of complicated structure patterns are automatically generated. Especially, there is a demand for realization of an automatic structure pattern generating system capable of easily selecting an optimum structure pattern among generated structure patterns. In such a system, a user would not be required to have detailed knowledge on structure patterns, thereby preventing errors such as editing mistakes and input mistakes. If many kinds of structure patterns are automatically generated and the user can easily select an optimum structure pattern from them, then the user can deal with various changes made in the structured document, that is, the user is provided with a system with high flexibility. Furthermore, there is a demand for realization of a function of specifying any item in a structure pattern and automatically editing only the item into a durable expression. With such a function, the user could quickly obtain a structure pattern suitable for his purpose without having the trouble of searching all the items from the item to be edited to the item having the element to be pointed to.