It is becoming increasingly common for server systems to interact with each other over networks, such as the Internet and local area networks, to provide distributed and flexible computing solutions. Such server systems employ a variety of different programming languages, some having a low level message format (e.g. binary representation), while other languages use high level and more complex message formats (e.g., SOAP, XML-based message formats).
While high level languages can provide more flexible techniques for conveying message content and structure, such formats tend to necessitate lengthy and complex messages (this is especially the case for XML-based standards). As the message length and complexity increases, so does the required processing time needed to handle such messages (which includes, inter alia, the time needed to convert the message into a format suitable for use by the intended application). Accordingly, complex and lengthy message formats increase the constraints (e.g., costs, bandwidth and traffic requirements, etc.) associated with wide-scale deployments of architectures based on such formats. These constraints are particularly problematic in large Internet-based deployments where millions of messages need to be processed within short periods of time.
Currently, there are some software packages that may be used for message handling that can optimally process a limited number of special message types, provided however, that these special cases are identified in advance of the processing. Most message handling programs typically include a set of functions within a library that converts the incoming messages from the message format into the internal representations which are used by an application. These libraries contain handlers that are optimized to perform the analysis of certain types of messages and build the internal representation in a format useful for the application. With such conventional message handling techniques, the libraries are prepared in advance so that they typically cover only a limited number of special cases with regard to different message formats. However, when a message is received for which there is no corresponding optimized handler (such as the case with an application server that acts as a generic platform for different types of application having a shared common functionality), a generic handler is utilized that is typically not as efficient as an optimized handler. As a result, such systems utilize a set of optimized handlers that are estimated to correspond with certain types of expected message traffic, and these estimations do not always accurately reflect the types of messages most often received. Accordingly, conventional techniques are unable to maximize the number of messages that are processed with optimized handlers rather than generic handlers resulting in inefficient processing.
With XML Web Services protocols, generic handlers, or parsers, can be used to process any kind of XML document and also typically obtain and interpret the XML Schema of a message during run-time. More specifically, there are two main types of generic parsers that are often incorporated into libraries. Event-driven generic handlers (for example based on Simple API for XML “SAX” handlers used to configure Java beans) process the document, and call some functions in the application every time they encounter a new element, a new attribute or another identifying feature within a message. The application is responsible for interpreting each event and deciding how to process it. With this arrangement, most of the processing and analysis of incoming messages is handled by the application because the generic handler is doing little more than a lexical analysis. In order for this situation to be efficient, knowledge about the message format must be incorporated into the design of the underlying application. This requirement greatly reduces the flexibility of the type of messages to be received by the underlying application and results in additional programming costs.
Other generic handlers used in connection with XML documents build an internal representation of the XML document using a tree structure such as a Document Object Model (DOM). This internal representation can then be accessed by the application receiving the message content which uses queries to search for specific elements or traverse the tree by getting the list of children for each node in the tree.
In this case, the application does not need to be compiled to reflect the full document structure, but rather, it only accesses those elements of interest.
In addition to generic handlers, a typical library for XML documents will also incorporate specific parsers that are designed for specific XML Schema and are typically generated at or around the compile time for the underlying application (which are presumably optimized for the type of messages expected to be received). These handlers can only parse and validate those documents that incorporate a specific XML Schema. Similar to generic handlers, an optimized handler may or may not build an internal representation like a DOM tree (which is often hidden from the application programmer because the application is expected to use some schema-specific functions generated together with the parser for accessing parts of the data). An approach that utilizes a DOM tree is useful in accelerating the processing of certain type of documents as the application only has to query the DOM tree for parts of the documents of interest. While this simplifies the work for the application programmer, it is relatively inefficient, as XML Schema must be interpreted at run-time. This is especially time consuming if the handler must validate an XML data DOM tree. In addition, allocation of internal representations provided in such a DOM tree is not necessarily efficient because most handlers allocate the children of each node dynamically as a regular document.
It can be appreciated that a more efficient usage of XML Schema could allocate the internal data structures more efficiently, based on constraints specifying the number and type of children of each node. In addition, if the internal representation is not optimal then it is likely that the functions querying the tree are not necessarily optimal. Such inefficiencies can be avoided or reduced by generating a new optimized handler at run-time for the specific schema used by the documents, however, conventional technologies do not provide such on-the-fly generation of optimized handlers. The constraints imposed by the Schema on the document can be then compiled directly into the code of the specific parser instead of having to be interpreted every time. In addition the internal representation can be optimized according to the expected structure of the document as defined by the Schema.
Based on the foregoing, it can be appreciated that there remains a need for an arrangement where message processing handlers are optimized for the current type of traffic within a dynamically changing message flow.