Enterprises clearly want to leverage the vast amount of electronic data they process in conducting their businesses to understand the nature of these businesses. A purpose of data warehousing is to take operational data and turn it into analyzable data. There are three primary problems with this approach. First, the remote procedure call model used in client-server systems and the normalized data model used in relational databases tends to strip out much of the semantic information that would be useful in linking data elements together for analysis. Second, operational data lies in so many different data stores that it is difficult to marshal all the relevant data in a single location. Third, because operational data migrates to data warehouses over time, the resulting analysis cannot detect important events as they are occurring.
The rise of extensible markup language (XML) messaging as a primary means for business-to-business (B2B) commerce offers an alternative solution. With B2B XML messaging, enterprises may conduct their businesses electronically by sending XML business messages over the Internet to their business partners. These messages tend to be semantically meaningful and self-describing, addressing the first problem with data warehousing. While many different applications may process these messages for a given enterprise, they all have to pass through the boundary between the public Internet and the enterprise's private network, yielding a potential single point of data collection that would address the second problem with data warehousing. Moreover, enterprises can also perform real-time analysis of incoming operational messages at this same point, overcoming the third problem with data warehousing.
The barriers to performing this type of analysis on the operational XML message stream are significant and include:                Detecting XML messages of interest among all network traffic without impacting other network components.        Extracting XML data from a variety of underlying transports (e.g., HTTP, JMS, MQSeries), packaging approaches (e.g., MIME), and XML application protocols (e.g., BizTalk, ebXML, RosettaNet).        Maintaining the semantic relationships among elements in the same messages and among different messages.        Applying a variety of different statistical analysis techniques to the same data under different conditions and for different purposes.        Providing great enough throughput under high message loads.        