In today's business environment, many applications need to use data that are warehoused in diverse data sources and repositories. These data are typically expressed in different formats and languages, and are retrieved using different access methods and delivery vehicles. Intermittently, new data sources may be added to the corpus that the application must handle, while existing data sources may be removed or changed. These problems are particularly prevalent in organizations in which databases and applications have developed gradually over the course of years in response to growing Information Systems (IS) demands.
There is therefore a need for tools that enable database information from diverse sources to be integrated into an accessible whole. Such tools are required, for example, in such application areas as customer relationship management, personnel data warehouses, and performance analysis of computer systems and networks. The conventional approach to meeting this need is to create a new data warehouse and to copy into it the required data from the original sources. For example, IBM Corporation (Armonk, N.Y.) offers a product known as “DB2 DataJoiner” that is based on this sort of approach. DataJoiner is described at www-4.ibm.com/software/data/datajoiner. A solution of this type is also described in U.S. Pat. No. 5,884,310, whose disclosure is incorporated herein by reference.
Another method for manipulating heterogeneous data is described in U.S. Pat. No. 5,345,586, whose disclosure is likewise incorporated herein by reference. A global data directory is provided, which maps the location of data, along with specific data entry attributes and data source parameters. Various tables are used for dealing with the diverse data properties, including an attribute table, a domain table, a routing table and a cross-reference table. The tables are used in accessing the data, in order to provide a system user with a consistent interface to multiple distributed, heterogeneous data sources.
Markup languages are well known in the programming art. The most popular markup language is the Hypertext Markup Language (HTML), which is commonly used on World Wide Web pages and in other document applications. HTML is derived from the Standard Generalized Markup Language (SGML), and uses tags to identify certain data elements and attributes. HTML, however, is not extensible, in the sense that it uses a closed set of tags, and it has little or no semantic structure. In order to address these and other shortcomings, Extensible Markup Language (XML) has more recently been introduced by the World Wide Web Consortium (W3C). XML is defined by a standard available at www.w3.org/XML.
XML allows users to define their own sets of tags, depending on their application needs. Each XML document is associated with a Document Type Definition (DTD), which specifies the elements that can exist in the document and the attributes and hierarchy of the elements. Many different DTDs have already been developed for different domains, such as “performanceML” for the computer system performance evaluation domain; “CPEX” for the customer relationship management domain; “Health Level-Seven (HL7) XML” for the healthcare domain; and “Common Telecom DTD” for the telecommunications domain. XML.ORG maintains a registry of available DTDs at xml.org/xmlorg_registry/index.shtml. XML-schema is under development as an alternative to the DTD, as described at www.w3.org/TR/xmlschema-0.
Style languages are used to control how the data contained in a markup language document are structured, formatted and presented. For example, W3C has introduced the Extensible Style Language (XSL) for use in defining style sheets for XML documents. An XSL style sheet is a collection of rules, known as templates. When the rules are applied to an input XML file by a processor running an XSL engine, they generate as output some or all of the content of the XML file in a form that is specified by the rules. (In fact, an XSL style sheet is itself a type of XML document.) XSL includes a transformation language, XSLT, which is defined by a standard available at www.w3.org/TR/xslt. Rules written in XSLT specify how one XML document is to be transformed into another XML document. The transformed document may use the same markup tags and DTD as the original document, or it may have a different set of tags, such as HTML tags. Other style languages are also known in the art, such as the Document Style, Semantics and Specification Language (DSSSL), which is commonly used in conjunction with SGML.