The invention relates to architectures for computer-readable structures, such as markup language that represents pages for display on the World Wide Web and to apparatus and methods for parsing such structures. Specifically, the invention relates to processing architectures that are capable of self-modification based on the flow of data within the architecture. The invention also relates to parsers or execution engines for executing structures constructed according to such architectures and to caching systems and authoring tools for such architectures.
Web sites, especially those employed in electronic commerce, typically include user interfaces for permitting end users to view and enter information. Importantly, such web sites are frequently based on data, i.e., price listings, product listings, account status, that changes on a regular basis.
Various markup languages for generating Web content are known in the art. Generally, such languages provide a method of formatting text by adding information, typically in the form of tags, to the text of the computer-displayed document to indicate the logical components of the document, the layout of text, or other information that can be interpreted by a computer. One markup language in widespread use is Hypertext Markup Language (HTML). Another known markup language is Extensible Markup Language (XML), which is currently being standardized and will likely be widely adopted.
Both HTML and XML are really just reflections of a broader markup language known as Standard Generalized Markup Language. Both XML and HTML contain markup symbols to describe the contents of a page or file. HTML describes content, i.e., text and graphic images, only in terms of how content is to be displayed and interacted with. For example, the tag  less than P greater than  starts a new paragraph. On the other hand, XML describes the content in terms of what kind of data it is. For example, a tag such as  less than ADDRESS greater than  might indicate that the data that followed it is an address. As XML tags proliferate across the Internet, the ability to search Web documents and to manipulate data contained in Web documents will be enhanced.
Thus, an XML file can be processed purely as data by a program, or it can be displayed. For example, depending on how the application in a receiving computer processed the data tagged by  less than ADDRESS greater than , the actual data could be stored, displayed or retrieved by a search routine searching for address data. XML is considered an xe2x80x9cextensiblexe2x80x9d language because, unlike HTML, the markup symbols are unlimited and self-defining. XML is actually a simpler and easier-to-use subset of SGML, yet, because of its self-describing features, it is much more powerful than HTML. It is expected that HTML and XML will be used together in many Web applications.
Current solutions for creating layout for Web pages are code-intensive. When the data changes, web page authors must typically modify the coding that represents the layout. There have been efforts to provide web-authoring tools which reduce the effort involved in creating layout. Conventional web-authoring tools seek to provide a way for web developers to create layout quickly by utilizing a series of templates for portions of a page. For example, xe2x80x9cFRONTPAGE,xe2x80x9d developed by Microsoft Corporation of Redmond, Wash., provides wizards to construct layout to map to data. However, this is an author time solution and requires that the wizard examine the data and determine the layout when the web page is authored. Another tool, known as xe2x80x9cVISUAL INTERDEV,xe2x80x9d also developed by Microsoft Corporation of Redmond, Wash., uses design time controls which cause other controls to be instantiated at runtime, the latter controls being able to respond to changing data. However, these controls are very procedurally focused and the controls are instantiated by code that executes at a particular time in a control sequence. The procedural focus of such controls permits only limited flexibility in responding to changing data.
It would therefore be advantageous to provide an execution architecture for computer-readable structures, such as markup language, which provides for dynamic mapping of data to layout such that a web developer may create layout which is self-modifying based on the data source. Such an execution architecture would eliminate the need for manual modification of layout when data changes.
Current architectures also suffer from the disadvantage of inefficient caching. Typically, pages are divided into regions, with each region providing interaction with a user via mouse clicks, data entry, etc. When page regions or data associated with a particular region changes, the entire page is typically regenerated even though other regions typically remain unchanged. Under current architectures, it is difficult or impossible to perform dependency analysis on the page layout to determine which regions are unaffected by a change in a given page region or date. Thus, regeneration of the entire page is a measure to ensure that all regions are properly updated. However, regeneration of the entire page often unnecessarily results in increased use of resources and bandwidth. Thus, it would be advantageous to provide an architecture which lends itself to dependency analysis and therefore provides for efficient caching.
Another disadvantage of known architectures is that they do not allow designers of user interfaces to create display processes which automatically map to data that is passed to them. For example, when a designer creates a multicolumn table in HTML, with different shading for alternate rows, the coding for this layout must be done manually and must be modified if the data to be displayed in the table changes. Moreover, coding for layout involving changes in the number of columns in a table tends to be much more involved than coding involving changes in the number of rows of a table. It would therefore be advantageous to provide an architecture that includes generic processing elements that are capable of automatically mapping data passed to them. Current authoring tools typically provide users with property lists for the display components. For example, xe2x80x9cVISUAL BASIC,xe2x80x9d developed by Microsoft Corporation of Redmond, Wash., provides a property list for any control. Most HTML editors provide property lists for the low-level user interface intrinsics. However, such property lists to date, are not created for generic segments of XML. It would therefore be advantageous to provide an architecture that includes generic processing elements for which a property set may be exposed to permit the use of a familiar tool approach for web page authoring.
The aforementioned problems are addressed by the invention, which provides an execution architecture for computer-readable structures, such as markup language, which is based on data flow rather than execution control flow. The invention contemplates a data-centric architecture in which an overall process for generating markup language is modeled as a network of interconnected processing elements, each having a data input and a transformation input. The transformation input. represents a transformation that is applied to the data input. Each processing element generates output by applying the transformation input to the data input. According to the invention, the output of one processing element may be provided as either a data input or a transformation input to another processing element. The resulting architecture thus provides a network of interconnected processing elements which are modified dynamically depending on data flow. Regions of a web page are represented by top-level processing elements, which may have defined within them other processing elements wired to obtain a particular self-modifying process based on the data flow.
An exemplary implementation of the architecture according to the invention includes a network of processing elements defined by an XML tree. An XML input tree structure is used to define the data flow relationships between processing elements. Within the input tree, processing elements are defined by appropriate tags, recognized by an execution engine according to another aspect of the invention. Transformation input elements, according to this exemplary implementation, are provided in the form of XSL trees, and data input elements are provided in the form of XML trees. Each processing element therefore generates an output tree, in the form of an XML tree, which is constructed by applying the XSL transformation input to the XML data input. Nesting of processing elements within the input tree provides for the use of an output tree of one processing element as an input to another processing element.
According to another aspect of the invention, a message bus is implemented within the architecture. Each processing element within the input tree may be provided with one or more queries to the message bus, which specifies services to the processing elements in the network. When a particular message query finds a matching message on the message bus, the matching message tree replaces the input node in that particular processing element. In this way, a loose handshake is provided between the services specified on the message bus and the processing elements. Messaging thus provides a way to dynamically alter the characteristics of one or more processing elements based on specific services conveyed to the processing element network.
An execution engine for parsing a structure defined according to the invention is provided with appropriate instructions for recognizing processing elements and message hookups within the input tree structure. The execution engine parses the input tree structure using depth first traversal, first evaluation the deepest child nodes. Child nodes that are not processing elements are copied to the output tree. If a particular child node is deemed to be a processing element, the execution engine first determines if any message hookups exist on the data input node or transformation input node of the processing element. If message hookups do exist, the tree structure associated with the message being queried are used to replace the tree structure of the corresponding input node. The resulting tree structure is placed in the appropriate position in the output tree. After the deepest child nodes are traversed, the execution engine unwinds the recursion to the next level up, applying the transformation to the output tree on the unwind, and transverses the child nodes at that level, repeating the process of evaluating processing elements and message hookups. The process terminates when the top-level nodes have been unwound.
Another aspect of the invention utilizes XML schemas as an additional tree associated with each processing element. These XML schemas provide descriptions of the expected format of the input and transformation trees for a particular processing element. They also provide descriptions of the expected output of the processing element and may describe particular methods available with respect to the processing element. These schemas, since they are described themselves in XML, may be examined by the execution engine and by authoring tools for intelligently creating relationships between processing entities.
Yet another advantage of the invention is efficient caching of data or page regions. The declarative nature of the data tree, transformation tree, message tree and processing element output trees, provides for efficient dependency analysis on the architecture to determine which processing elements must be reevaluated upon a change in data, transformation input or messages. Specifically, after an architecture is evaluated for the first time, a cache may be provided for storing the evaluation of each individual processing element. Upon a change in data, messages, or transformation input, the reevaluation of particular processing elements can be controlled based on the dependency analysis, thus avoiding the unnecessary reevaluation of processing elements which have not changed.