Enterprise software tools are used by many organizations as common platforms for linking the different applications and data structures of the organization. For example, SAP AG (Walldorf, Germany) offers the “SAP Enterprise Portal,” a product enabling various knowledge-management and collaboration functionalities for managing the different IT resources of the organization. Additional information regarding this product is available at www.sap.com/solutions/netweaver/enterpriseportal.
Some of the functions provided by enterprise software tools require the processing of XML documents. XML (Extensible Markup Language) is a simplified version of Standard Generalized Mark-Up Language (SGML), designed initially for Web documents. XML allows designers to create their own customized markup languages, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. XML is a formal recommendation of the World Wide Web Consortium (W3C). Additional information regarding XML in general, and particularly the XML 1.0 standard, is available at www.w3.org/TR/2004/REC-xml-20040204/.
Processing an XML document typically comprises parsing it using an XML parser. Several XML parsing methods are known in the art, and several commercial XML parsers are available in the market. The XML parser typically produces a DOM (Document Object Model), which is a logical representation of the document in a hierarchical tree form. The DOM is a platform-independent and language-independent interface that allows programs and scripts to dynamically access and update the content, structure and style of the document. The DOM programming interface standards are defined by the World Wide Web Consortium (W3C). Additional information regarding the DOM standards is available at www.w3.org/DOM.
The above-mentioned XML standard makes extensive use of regular expressions (REs). A regular expression is a template or a pattern that can match various text strings. The pattern is represented in terms of characters and meta-characters. The meta-characters operate as “wildcards,” allowing different groups of characters to match a single template. Regular expressions may be nested, i.e., they may contain nested references, pointing to or including other regular expressions. For example, the XML 1.0 standard comprises 84 regular expression definitions, out of which 67 are nested.