Efficient parsing of non-XML messages is a requirement in many enterprises. Typically, non-XML messages from a legacy application are received on a queue, parsed into a structure that the receiving system can understand, processed and forwarded to the next application. Parsing is performed by walking a data structure which describes the message format (hereafter called the ‘message model’) and extracting from the bitstream markup and/or data for each model element. Repeated parsing of successive messages can be extremely processor-intensive.
There is a need therefore for a solution which improves parsing performance and thereby message throughput.
Such a technique has already been provided for self-defining XML messages. This is described at: www2005.org/cdrom/docs/p692.pdf
Co-pending U.S. patent application Ser. No. 11/426,655 provides another solution. This describes the generation of a parsing template. The parsing template comprises a set of structural elements for a particular type of input message—for example, substrings representing parts of an XML message that are expected to be repeated within other requests from the same requester type for the same service. The template also includes inserts to indicate places in the messages where variation can be expected between one message and the next. This patent application however retrieves a complete parsing template based on a received service request and expects only small variations.
A more flexible mechanism is required in a situation where a received message is non-self-defining. Such messages (e.g. non-XML data) can be presented in a huge variety of formats and styles, making the XML techniques referenced above unfeasible. It is not feasible to use parsing templates in this environment.