The present invention relates in general to utilities for transforming messages from an external form to an internal representation for use in a given application/computing environment and vice versa. More particularly, the present invention relates to such a utility that can handle messages relative to a variety of external systems or variety of different formats without recompiling and that provides significant flexibility in message parsing/processing.
Many software applications require transformation of data from the form in which data is received by the application to a form upon which the application can act. Likewise, an application""s internal representation of data must often be transformed to an external representation that the application""s users (human or machine/software) can understand. Such transformation may address a variety of issues including, for example, different data formats, different syntax and different messaging protocols.
In order to address such issues, it is typically necessary to parse an incoming message into elements or xe2x80x9ctokensxe2x80x9d that the application can properly process. The size of these tokens or parsing resolution depends, inter alia, on the nature of the subject matter involved and the desired functions of the application.
Consider first the context of text based messages. If the application is formatting a long document, it may need to process large chunks of content corresponding to paragraphs, chapter headings, section headings and the like. By contrast, an application screening messages for security purposes may need to review individual words and authorization codes. By way of further example, an application for routing messages to particular network nodes may require access to particular header fields or other address information but may be otherwise uninterested in the message content.
Next consider the context of data based messages or mixed text and data. For example, one application may be designed for accessing business records in a company database based on various fields identifying customers, products, product codes, pricing, etc. Another application may be involved in distributing strategical information relating to the locations and movements of identified military assets. In each such case, a utility may be required to identify and separate for internal handling particular fields of data whose size, in terms of data bits or text characters, can vary depending, for example, on the source and format of the incoming message.
A variety of parsing utilities exists. Often, these utilities are pre-configured for use in a particular computing environment, e.g., for handling a particular type of messages. For example, a utility for handling XML messages may parse messages based on tags included in the messages which can be managed based on a Document Type Definition. Other utilities may be configurable to handle different types of messages. Generally, such utilities are configured at compile time for a particular application. In many cases, these utilities parse messages in a single pass based on predefined rules applicable to the whole message.
Such utilities have certain limitations related to operation in a multi-system environment, i.e., an environment where messages are received from multiple sources and/or are transmitted to multiple destinations. In particular, utilities that are pre-configured to operate in a particular environment may be unable to properly handle all messages in a multi-system environment. Configurable utilities may be able to handle a variety of messages, but may require re-compiling, and sometimes re-certification to handle multiple message types. As a result, configurable utilities may entail unacceptable message transfer latency for some applications. Single-pass utilities as noted above may also be constrained by their parsing rules to provide a particular parsing resolution which may not fully meet the needs of a multi-system environment.
The present invention is directed to a utility for parsing and formatting that has improved processing flexibility. The utility can be reconfigured on-the-fly, i.e., after compile-time or during run-time, to handle different types of messages. It can also parse incoming messages recursively to provide a desired or selectable level of parsing resolution. The invention thus has certain processing advantages for a variety of environments including multi-system environments.
According to one aspect of the present invention, a method and corresponding apparatus (collectively, xe2x80x9cutilityxe2x80x9d) are provided for transforming a message between a first internal form associated with a first internal system (such as an application embodied in software, hardware and/or firmware) and a second external form, different than the first internal form, associated with a second external system. The utility may receive the message in the second external form of the second external system and transform it to the first internal form for use by the first internal system or vice versa. The methodology involves establishing a generic processing engine for performing a transformation process relative to an information stream associated with the second external system and operating the processing engine to receive the message in one of the first and second forms. The generic processing engine is adaptable to handle messages in multiple (two or more) forms associated with multiple external systems. The method further involves operating the processing engine to access storage including specification information for multiple external systems including the second external system, configuring the generic processing engine based on external specification information regarding the second external system, and operating the configured processing engine to transform the message between the first internal form and the second external form. In this manner, the utility is operative for efficiently transforming messages in a multi-system environment.
In order to facilitate such multi-system operation, the step of configuring can be implemented after compile time or during run time. The associated process involves compiling the generic processing engine and then operating the compiled engine to access the external specification information. In this regard, the processing engine may be table-driven such that specifications for various external systems are stored in separate tables of a database, e.g., a relational database. Each such table may store a list of parameters defining an external format indexed to an identifier for that format. Then, upon receiving a message or otherwise being prompted, the engine can identify the external system, access the associated specification and use the parameters for configuration. Many different message formats may be supported in this regard, including text and image formats. The engine can thus convert an external message to an internal form and/or internal messages to the external form. By virtue of this architecture, the engine can handle messages in multiple formats without recompiling or re-certification.
In accordance with another aspect of the present invention, a utility is operative for recursively parsing an input to provide a desired or selectable level of parsing resolution. The associated methodology involves: establishing a module for processing an information stream, the module including a parsing engine and a processing engine; first operating the parsing engine to select a portion from said data stream (e.g., the full text of a message or a portion thereof) and define said portion as a parent object; second operating the parsing engine to parse the parent object into multiple child objects, where each child object has a child content that is a subset of a parent content of the parent; third operating the processing engine to perform a predefined process (e.g., performing a security xe2x80x9cdirty wordxe2x80x9d screening process) on at least one of the child objects; redefining at least a second one of the child objects (the same as or different from the first one) as a parent object; and repeating the steps of second operating and third operating with respect to the redefined object.
The utility is thus operative for recursively processing the input information stream to provide a desired or selectable level of processing resolution. In this regard, the process of redefining a child object as a parent object and repeating the noted steps with respect to the redefined object may be conducted iteratively until sufficient parsing is achieved. Different portions of the input, e.g., a message, may be parsed to different resolutions if desired for a particular application. Similarly, sibling objects may undergo a different number of iterations to achieve a common parsing resolution. For example, a parsing process may be conducted on a text based document. The desired resolution for the process may be word-by-word parsing. An initial step of the process may parse the document into a number of headings and a corresponding number of sections. Each such initially parsed token, referred to below as a xe2x80x9cMagxe2x80x9d, is a sibling object. The headings may be directly parsed into words whereas the text sections may require further recursive parsing into paragraphs, sentences and the like. Thus, the parsing process, by virtue of its recursive functionality, is highly adaptive to various applications and types of content.
In accordance with a further aspect of the present invention, a utility is operative for transliterating a message from one input format to multiple output formats. As used herein, xe2x80x9ctransliteratexe2x80x9d refers to a process by which a message is transformed from one format to another on an object-by-object basis, where each object has a content suitable for transformation. The associated process involves accessing a message reflecting a first input format; making at least one copy of the message; accessing a first external format specification relating to a first one of the multiple output formats; configuring a processing engine based on the first external format specification; first operating the configured processing engine to access a first instance of the message (the original or a copy) and transform the message based on the first specification to provide a first transformed message; accessing a second external format specification relating to a second one of the multiple output formats; configuring the processing engine based on the second external format specification; second operating the configured processing engine to access a second instance of the message (the original or a copy) and transform the message based on the second specification to provide a second transformed message; and transmitting the first and second transformed messages.
The utility has particular advantages with respect to certain multi-system applications. For example, the utility may be useful in connection with distribution modules for distributing internal messages to multiple end users. Relatedly, this process may be implemented in connection with a centralized processing module for receiving input from any of multiple sources and distributing output to many of multiple end users or other target systems. The invention thus greatly facilitates certain processes within multi-system environments.