Nowadays, many business processes rely on service-oriented architectures (SOAs). Such architectures facilitate the cooperation of computers over a network such as the Internet without requiring an agreed communication standard, e.g. message format, for communications between such computers. This is realized by providing software programs that provide application functionality as services to other applications. Such services are typically independent of vendor, product or technology.
As is known per se, a service is a self-contained unit of functionality, which can be combined by other software applications to provide the complete functionality of a large software application. Every computer hosting part of the SOA can run an arbitrary number of services, and each service is built in a way that ensures that the service can exchange information with any other service in the network with minimal human interaction and without the need to make changes to the underlying program itself.
Services are typically requested and deliverables are typically returned in the form of messages between computers or between software applications running on these computers to be more precise. Such messages typically comprise a plurality of data fields, wherein each data field contains information element such as a definition of another data field, a user-specified parameter or variable, and so on. In order to maximize reusability, such messages are frequently created using some template in which the format of the plurality of data fields is predefined, such that many different messages can be generated using the same template. The direct consequence of this approach is that a message generated by a consumer requesting a service typically contains both relevant and redundant information, because only a part of the message template contains information relevant to the requested service. This is because the template typically includes a large number of different types of data fields, such that the message can be used for many different purposes.
Moreover, messages may be generated in many different formats, which format for instance may depend of the platform and/or the programming language used to generate the message. Therefore, in order to be able to forward the message to an intended destination, e.g. a computer offering a particular service, the incoming message typically need to be converted into a format that can be understood by its destination or at least the relevant information needs to be extracted from the incoming message.
To this end, the SOA typically comprises a message broker, which is a software module implementing the required message conversion or data extraction. To this end, the message broker typically has access to so-called message schemas, which define the structure and the type of contents that each data field within the message can contain. In other words, a message schema explains the formats available in the message template from which the message has been generated. As will be apparent, such a message schema is specific to a particular template, e.g. a template generated in a specific language such as XML.
The message broker typically further comprises some flow logic, i.e. program code, which for instance may include routing information for routing the relevant contents of the message to the intended destination. This is because there may be multiple service providers providing a similar service, wherein the appropriate service provider is selected based on specific information in the message, which specific information enables the identification of the appropriate service provider by the flow logic.
For the message broker to successfully pass on the message or relevant contents thereof to an intended destination, the message broker typically requires one or more parsers that parse the incoming message based on the information provided by the message schema of that message. A parser is called when the bit stream that represents an input message is converted to the internal form that can be handled by the broker. Parsers are called when required by the message flow.
Parsing is a time-consuming and therefore costly exercise. Parsing becomes particularly costly when the whole message has to be parsed, because the message typically, comprises a large number of data fields as explained above. For this reason, techniques have been proposed in which only parts of a message are explicitly parsed in order to provide a cost-saving. Such techniques typically rely on the sequential nature of the parsing process, wherein the sequence of data fields are parsed one at a time in a sequential manner. Examples of such techniques include eager parsing, in which all data fields up to and including the relevant data fields are parsed and irrelevant subsequent data fields are discarded or simply copied over into an output message without parsing.
Another example of such a selective parsing technique is present in the IBM Integration Bus™ products of the IBM Corporation. In these products, a user can identify certain data fields in a message that are never referenced by the message flow, and can request that the identified data fields are parsed opaquely. This means that these elements are simply copied across the message flow. This reduces the costs of parsing and writing the message, and may improve performance in other parts of the message flow. Opaque parsing has the benefit over for instance eager parsing that irrelevant data fields preceding the data fields of interest can also be ignored in the parsing process.
However, this technique relies on a user identifying suitable candidates for opaque parsing in an input message. This requires detailed design time knowledge, including an understanding of the message contents, and requirements of any message mediation flow processing. Such information may not be available to a user. In addition, the actual contents of a message at runtime can have a significant impact on the processing costs of the message, and these contents may not be well known to a message flow designer.