For decades, the software industry has encouraged the development of applications that can define unique data structures for storing and passing information. The format of these structures is known only to the applications creating them. Little or no information is contained within the structures to identify the data. While efficient for the applications, this practice has made the integration of, and communication between, applications a difficult and tedious task. With the rise of the Internet as a business-to-business communication medium, the need to simplify this integration and communication has become critical.
The Internet is a global network of computers and computer networks that are linked with one another and communicate by virtue of the so-called Internet Protocol (IP), which is well known in the networking arts. IP is a packet-switched communications protocol. In such protocols the information to be transmitted is broken up into a series of packets (i.e., sets of data) that are encapsulated in a type of electronic envelope (i.e., the packet), including a portion called a header that includes fields for identifying the source of the transmission, the destination, and other information about the data to be delivered to the destination, which is often referred to as the payload.
A popular application for the Internet is to access the so-called World Wide Web (i.e., or simply the “Web” or “web”), which uses a protocol called HTTP (HyperText Transfer Protocol) by client units for connecting to servers associated with the Web. A client unit (e.g., a microcomputer unit with a communication subsystem connected to the Internet) can invoke the HTTP by simply typing an “http://” prefix with the desired Web address. Once the connection is made to the desired Web site, the user (or client) can access any document stored on that site that is available to that user. The interface used by the client is an application program called a Web browser (i.e., Netscape and Explorer browsers are popular examples). The browser establishes hypertext links to the subject server, enabling the user to view graphical and textual representations of information provided by the server.
The Web generally relies on a language called HTML (HyperText Mark Up Language), which with Web-compliant browsers are capable of rendering text, graphics, images, audio, real-time video, etc. HTML is independent of client operating systems. So HTML renders the same content across a wide variety of software and hardware operating platforms. Software platforms include Windows 3.1, Windows NT, Apple's Copeland and Macintosh, and IBM's AIX and OS/2, HP Unix, etc. Popular compliant Web-Browsers include Microsoft's Internet Explorer and Netscape Navigator. The browser interprets links to files, images, sound clips, etc. through the use of hypertext links. Upon user invocation of a hypertext link to a Web page, the browser initiates a network request to receive the desired Web page.
Internet users are faced with an ever-increasing number of sites, which each contain varied information. This results in difficulty finding the desired information. Among commonly used tools for locating information are the so-called search engines or portals to the Internet. These sites provide various indexes to other sites. Search engines use crawlers or spiders, programs having their own sets of rules, to index pages on the Web. Some of these follow every link on every page they find. Others employ particular types of links.
A common problem with the general Internet search is that, often too many result pages are returned and many of these have low relevance to the search request issued by the end-user. Typically, the search engines used in corporate sites are not as powerful as the Internet search engines and typically provide less information than is desirable.
Borrowing from the remarkable success of HTML (i.e., HyperText Markup Language) to render documents universally to users on a computer display, the industry has developed XML (i.e., extensible Markup Language) to render documents universally to applications (i.e., as well as WXML, for “wireless” devices). XML is a well-known standard for encoding both text and data so that content can be processed with relatively little human intervention and exchanged across diverse software, hardware, operating systems, networks and applications thereof. XML generally offers a widely adopted standard for representing text and data in a format that can be processed without much human or machine intelligence. Information formatted via XML can be exchanged across platforms, languages, and applications, and can be utilized with a wide range of development tools and utilities.
FIG. 1 depicts a block diagram illustrating a prior art XML configuration 100. FIG. 1 specifically illustrates a business-to-business application of XML. As illustrated in system 100 of FIG. 1, XML communications 102 can enable communications with business applications 104, content and/or documents 106, and a web browser displayable on a computer 108 in communication with a computer network, such as the Internet 110 and/or additionally, computer networks such as an Intranet 114 or other internal organizational computer network. XML communications 102 enables communications between a Web server 112 and other hardware or computer devices, such as a mainframe computer 120. Additionally, data can be retrieved from a repository 116 that is formed from a database 118 and associated content or documents 120 thereof.
XML communications 102 generally utilizes the hierarchical markup structure of HTML to store data in a document, such as, for example, one of documents 106, and is extensible in that the markup tags can be defined as required by the creators of the document. While this does not solve the problem of understanding the data contents, at least the description of the data structure is carried with the document.
To lessen the problem of understanding the data contents, many associations, forums, and consortia have formed to define normalized tags and hierarchies, typically along vertical industry lines. Because tags can be nearly anything, without the lessening effect provided by the normalization efforts, the software problem of associating the data with software methods is an infinite-to-infinite search problem. With normalization, associating data with software methods represents a (very) many-to-many search problem, which can serve to reduce the problem from impossible to merely extremely difficult.