When multi-step data transfer transactions take place among computing systems, errors can occur at any step in a transaction. When an error does occur in a step, it is often desirable to reverse any data changes that were made in that step and in previous steps. Many commercial, off-the-shelf (COTS) applications do not have internal mechanisms that provide for transaction processing for the data stores. That is, there is typically no automated method for rolling back a change that has been made to the data in a COTS application. As used herein, the term “data store” can refer to various computer-based storage systems, protocols, and applications such as relational databases, directories, and spreadsheets.
Transactions with the data stores used in an enterprise's day-to-day operations are sometimes limited to the updating of critical data. It is typically undesirable to interrupt these critical updates with routine requests for the retrieval of data. Such queries can instead be made to another type of data storage system known as a data warehouse. Data warehouses typically contain copies of the data contained in multiple data stores. Copies of all of the data in the operational data stores are typically sent to a data warehouse in a periodic, batch process. Routine queries can then be made to the static data in the data warehouse and the operational data stores can be left free for more critical activities. However, due to the batch nature of the copying of data from the operational data stores to the data warehouses, the data warehouses may not always contain the most current version of the data in the data stores.
As further background, messaging has emerged as a popular form of asynchronous communication between heterogeneous computing systems and applications. Several middleware and enterprise architecture tool vendors offer messaging solutions based on proprietary technology that can converge middleware and application services by combining application servers with messaging and business process management solutions. Other trends include end-to-end integration across enterprises and the emergence of new web-based service standards such as XML, SOAP, and UDDI. In addition, JMS provides a standard interface for incorporating messaging within Java applications. It acts as a wrapper around the messaging technology of other vendors.
Several types of topology can support messaging. These include publish/subscribe, point-to-point, hub and spoke, bus, and distributed hub. Publish/subscribe messaging is organized around topics. A publisher sends a message to a topic and any interested subscriber can receive the message from the topic. Publish/subscribe messaging is typically used when multiple subscribers might be interested in the same message. It is appropriate for notification messages for which no response is required upon consumption. It is also useful for enterprise-level messages such as account creation, account termination, and subscription suspension. For example, a message server could publish an “account created” event after an account has been created and subscribers could consume the message.
Point-to-point messaging is based on message queues. A producer sends a message to a specified queue and a consumer receives messages from the queue. Multiple senders and receivers are possible for a queue but an individual message can be delivered to only one receiver. Point-to-point messaging is typically used when only one consumer exists and the message is targeted for a known application. It is also useful when successful consumption by the target system is a requirement since messages stay in the queue until the receiver picks them up. As an example, point-to-point messaging would be appropriate within a telecommunications company when a message to reserve a mobile telephone number is transmitted. Such a message would typically be transmitted to only one consumer.
In hub and spoke messaging, all applications are connected to a central message server. The message server is responsible for routing, authentication, access control, and transformation between data types. An application can act as a publisher and the message server can act as a subscriber or the message server can act as a publisher and an application can act as a subscriber. Hub and spoke messaging is typically used when greater control is required outside the applications. For example, because of workflow and timing issues, business process integration is typically tied to a message hub. Hub and spoke messaging is also used when there is a need to keep client applications simple. An intelligent message hub allows the use of simpler clients such as JMS APIs. Since hub and spoke messaging is centralized, it is typically implemented in a clustered environment for fault tolerance. A drawback of hub and spoke messaging is that the message server can become a bottleneck for messages between applications.
Under bus architecture messaging, applications publish and subscribe to messages on a bus. Any application can be a publisher or subscriber. Integration logic and intelligence is distributed in application adapters that handle routing and data transformation. Intelligence is thereby implemented in multiple locations. Messaging over a bus is useful for simple message sharing and broadcasting where complex rules, transformations, and workflows are not required. It is particularly suitable for applications that use the same data representation. It is possible to connect a message server/broker to a bus to centralize processing and rules.
Another messaging approach is the deployment of a distributed hub architecture. In this approach, multiple hubs can be present for different domains or organizations. Each hub could have its own localized rules and use a different messaging vendor or technology. Global rules could be propagated among the hubs. An architecture such as this can alleviate performance bottlenecks.
When applications that use disparate data formats need to communicate with one another, a transformation from one format to the other typically occurs. Two models for accomplishing a data transformation are distributed transformation and centralized transformation. In the distributed model, an adapter is present between each application and a common message server. The adapters can transform data between an application-specific format and a common format. When an application publishes a message it sends the message to its adapter. The adapter transforms the message from the application's native data format to the common data format and sends the message to the message server. When another application wishes to subscribe to the message, that application's adapter receives the message from the message server in the common data format. The adapter then transforms the message into the native data format of the second application and sends the message to the second application. A well-defined, stable protocol such as XML is preferable for the common data format. Messaging systems that use the publish/subscribe protocol are good candidates for this approach.
In the centralized transformation model, all data transformation is done in a central location. A first application that wishes to send a message to a second application can publish the message in its native format. The message is then sent to a centralized message server where a transformation takes place between the data format of the first application and the data format of the second application. The second application can then receive the message in its native data format. The centralized transformation model thus uses a pair-wise mapping approach between the source and destination systems. This approach is more applicable than the distributed transformation model to communication between commercial, off-the-shelf packages. Centralized, pair-wise transformation is also appropriate for systems that use point-to-point communication and for non-enterprise events such as the transfer of data that is specific to only a few applications.
Two types of messaging can be described, data messaging and notification messaging. In data messaging, all of the data that one application wishes to transfer to another application is packaged in a single published event. The sending application publishes the data to a message server and the message server transfers the data to the receiving application when the receiving application subscribes to the published event. The receiving application receives all of the relevant data as part of the message; it does not need to perform any extra queries. Data messaging places a heavy load on the message bus. This type of messaging is suitable for communication between commercial, off-the-shelf applications since all the data to be transferred between two such applications typically must be contained within a single published event. Data messaging is also appropriate for communication across domains within an enterprise.
In notification messaging, an application sends its data to an information broker which places the data in a data store. The application then publishes a notification message to a message server informing a receiving application that the data is available. Upon receiving the message from the message server, the receiving application can query the information broker which then retrieves the data from the data store and transfers it to the receiving application. Since the notification message that is published from the sending application to the message broker contains only a small amount of data, a lighter load is placed on the message bus compared to data messaging. Notification messaging is appropriate for distribution of data to custom-developed applications since these applications can be modified as needed to make queries to the information broker for the desired data.
The queues or channels through which applications and a message server communicate can be application-specific or shared. In the application-specific queue architecture, a separate request and reply queue is present for each application. An application always places its messages on and receives its messages from the same queue. This architecture promotes easy identification of the source of a message and allows the message server to control when applications receive messages. Application-specific queues are useful in the hub-and-spoke and point-to-point messaging systems.
In the shared queue architecture, queues and messages are shared between multiple applications. This allows multicasting of messages to multiple applications. Queues can be grouped by functions or domains such as account information, profile information, security information, or services information. This promotes the implementation of common processes but can require that filtering be implemented in each application. Shared queue architecture is appropriate for the publish/subscribe and bus messaging systems and other situations where the timing of event consumption is not an issue.
Numerous criteria can be used in the selection of a messaging technology. One factor is the level of support for multiple communication paradigms such as publish/subscribe, point-to-point, hub and spoke, and bus-based topology. Another factor is quality of service issues such as guaranteed delivery of messages and priority of messages. The level of security support, including items such as Secure Socket Layer, authorization, authentication, and firewalls, can also be taken into consideration. Massive scalability and high throughput without appreciable performance degradation are also desirable. Another factor is the use of a standards-based implementation such as Java or XML rather than reliance on product-specific knowledge. Connectivity to other messaging systems such as JMS and Tuxedo is also desirable. Coexistence with an application server, while not typically required, is often desirable.
Messaging implementations often fail due to poor implementation of the messaging tool as opposed to the inadequacy of the tool itself. An enterprise architecture group can provide guidance on key areas as part of the architecture definition for event-based messaging. For example, the enterprise architecture group could assist in choosing among the various messaging models. Whether a common enterprise model will be used for event specification as opposed to application-specific event specification can also be decided. A decision can be made on whether to use centralized or distributed data transformation. Guidelines can be established for queue architecture such as whether there will be a single input and a single output queue per application, whether there will be shared queues, and whether multiple messages or business events can be placed on the same queue or if a separate queue will be used for each business event. It can also be decided whether JMS wrapping will be used for messaging technology and whether a combination of event-based messaging and an information service layer will be used for complete data and transaction management.
FIG. 1 depicts an enterprise architecture in which front-end clients are integrated with back-end systems through an Enterprise Integration (EI) layer. The front-end clients, such as Premiere 110, IVR 120, CTI 130, and SPCS 140, can use the EI layer 100 to access back-end systems where data is hidden from the client applications. The EI layer 100 provides access to data located in two types of stores, back-end systems such as P2K 150, NMS 160, and Renaissance 170, and an integrated database of operational data that can be referred to as the Operational Data Store (ODS) 180. The ODS 180 can be largely replicated from other back-office applications but can also serve as the System of Record for some data. The EI layer 100 can access the appropriate data store using system-specific protocols such as SQL, CORBA 155, MQ 165, or RMI 175. Front-end applications can use IBM's asynchronous MQSeries messaging 185 to interact with the EI layer 100. A queue can be set up for each function or business transaction and multiple front-end applications can use the same queue when calling the same business transaction. The EI layer 100 can support create, read, update, and delete (CRUD) functions, particularly the translation of back-end Application Programming Interface (API) calls.
The EI layer 100 can be implemented as a set of C++ programs called Business Logic Modules or BLMs, each of which provides a separate transaction. Each BLM can read messages from and write messages to an individual queue. Transformation logic and source data rules can be hard-coded in the BLMs. Each BLM that requires access to a particular back-end system can contain API logic to communicate with that system. A reusable framework for logging, exception management, and connection management could be present within the BLMs.
An enterprise integration architecture such as this has several limitations. First, there may be no infrastructure support for a business event notification mechanism such as publish/subscribe messaging if communication between front-end applications and the EI layer is done through point-to-point messaging or if communication with back-end systems is done with APIs or with MQSeries. There may be no adapters or connectors to isolate the enterprise from application-specific requirements and there may be no transformation tools for translating an enterprise-specific data format into application-specific formats. If data transformation is hard coded, changes to data mapping can be expensive. Also, since client applications may be required to use a message-oriented protocol for data access, development complexity and a performance bottleneck can result. In addition, no rules repository may exist to define System of Record rules based on subject area. This means that the reorganization of data could lead to code changes. Another limitation may be the use of fixed-length field structures as opposed to self-describing message formats such as XML. Such structures require that code changes be made as new data elements are introduced. A lack of support for transaction management could be another limitation. There may also be no use of metadata. Other limitations could include an architecture that is not component-based, an inadequate object model, an infrastructure that is unable to achieve a desired uptime and response time, and a reduced speed in deploying new interfaces because of the lack of appropriate tools and technologies within the integration layer. These limitations can lead to difficulty in making changes, an inadequate reuse of business-related services, a higher cost of maintenance and debugging, a longer time to market for new capabilities and changes, and the necessity for increased testing due to the lack of isolation of functionality.
Another limitation is that development of complex client applications may be necessary in order for clients to access back-end data using MQSeries messaging via the EI layer. For example, the use of asynchronous messaging for synchronous data may be required; synchronous methods may not be available to access back-office data and services. The use of MQSeries messaging via the EI layer could also entail the use inflexible message formats that may not shield applications from a ripple effect generated by changes to other systems. A degradation of performance could also result. Another issue is the lack of transmission management in the integration layer. This can result in data being out of synchronization.
Another limitation is that business process steps might be hard-coded in the applications in multiple systems. The same processing step may be duplicated in multiple systems. Modifications to the business processes can require coordinated changes to multiple applications, typically entailing costly code changes. Encoding and changing processes in multiple applications in a timely and cost-effective manner can be difficult and the limited reuse of code lends itself to custom development. Thus, the embedding of business process steps in multiple systems can hinder the ability to roll out new products and services and to make changes to existing ones. For a process requiring multi-step interaction with a back-end application, the steps typically must be encoded in the client and hence may be difficult to modify. Also, there may be no end-to-end visibility of business processes, no tracking capability, and no ability to optimize processes and gain operational efficiency.
Various point-to-point interfaces may exist for key business transactions that are not brokered. These interfaces would bypass the EI layer and directly perform CRUD functions in the target system. Since each target system typically has its own API set, each application that needs to communicate with a particular target system typically needs its own code developed specifically for that target. Thus, the point-to-point interfaces create a tight coupling between applications that can be costly to change. Also, business transactions performed across point-to-point interfaces such as these are not visible to other applications. Applications requiring knowledge of these transactions typically must use data replication to make assumptions about the transactions.
Replication processes can introduce data integrity problems that cause decreased times to failure, increased times to repair, and inflated costs due to rework and data inconsistencies. Replication lag times can cause stale data that can lead to poor customer experience and potential loss of revenue. As replication of data progresses, syntactic and semantic errors can increase with increased distance from the source of the data. Replication processes consume additional resources such as personnel, hardware, and software with each replication. Integration of replication and batch processes is typically point-to-point and based on the structure of the underlying data models. This can cause tight coupling and inflexibility between systems and create a potential bottleneck as the volume of data grows with a rapidly increasing subscriber base. Also, numerous replication logic rules within the target applications may need to be redeveloped so that they can deduce business events from replicated data.
In an enterprise integration architecture such as that just described, each front-office application would typically need to be aware of the locations of data and services in the enterprise and would typically need to use the access mechanism supported by the back-office application to which it is connected. Each application would typically need to understand the format and structure of the data in the back-office system and map it to its own internal representation. Reorganization or relocation of data and services could lead to costly and time-consuming changes to multiple applications.