The present invention relates generally to methods and systems for monitoring processes in computer systems. More particularly, the present invention relates to identifying a set of correlated event or process data among heterogeneous computer systems or applications.
Computer systems that support today's globally distributed, rapidly changing and agile businesses are steadily growing in size as well as complexity. They are becoming increasingly federated, loosely coupled, distributed and at the same time generating huge numbers of data artifacts, processes, or events (hereinafter “events”) ranging from record entries representing business or organizational activities to more technical events at various levels of granularity. Industries such as healthcare and insurance have witnessed an explosion in the growth of semi-structured organizational processes that has been fuelled by the advent of such systems. These organizational or scientific processes depart from the traditional kind of structured processes in that their lifecycle is not fully driven by a formal process model. While an informal description of the process may be available, the execution of a semi-structured process is not completely controlled by a central entity (such as a workflow engine).
Monitoring such semi-structured processes enables a variety of applications such as process discovery, analytics, verification and process improvement. Accomplishing this is an important research challenge.
Correlating events generated by heterogeneous, distributed systems allows for the isolation and tracking of end-to-end instances of a given semi-structured business process computer application. Correlating events has been addressed in the area of integrating large and complex data sources. In this area the task of matching schemas (relational database schemas for instance) for the purposes of tracking an end-to-end process instance has been identified as a very time-consuming and labor intensive process.
Consequently a significant amount of research effort has been devoted to model management such as information retrieval, knowledge representation, schema mapping and translation as well as integration. Extensive work has been conducted in the domain of data integration and exchange, for instance, required for Extract Transform Load (ETL) processes in data warehousing. In data warehousing, an ETL process requires the extraction of data from various sources and the transformation of the data to match a corresponding target schema. In the field of e-commerce, data exchange scenarios require extensive knowledge about the semantics of data structures in order to convert messages from a source schema to a target schema.
Existing work that has been devoted to deriving relationships between data elements has a strong focus on foreign-key relationships and the assumption of relational data (i.e. normalized). The process of finding and defining relationships (correlations) in an arbitrary, potentially redundant and non-normalized data space has thus far received little attention, although it has the potential for tremendous impact.
Processes are frequently executed across different, possibly independent computer systems. On such platforms, particularly event-driven architectures, where no component recognizes another component and the interactions are driven by events in an asynchronous fashion, it is difficult to create a unified view of processes (also known as composite business applications.) Not every event or artifact contains a unified process instance identifier for creating an end-to-end view of processes. In certain scenarios, events are also transformed or aggregated during execution steps so that identifiers that relate events to process instances or to each other become extremely hard to track. This arises when tracking process instances across various system and application layers. In fast changing environments where business application processes are executed across a wide range of distributed computer systems it is difficult to trace process instances as the relationships of events must be explicitly known and defined. Furthermore, supposedly isolated process instances, a transport coordination process for example, can be related to other processes such as the order management and invoicing process. However, in the latter case, the attributes that bridge those distinct processes can only be found in the events of isolated processes instances.
Some existing work addresses how to correlate events to create a historic view to explore and discover different aspects of business processes in the computer applications. Process mining partly addresses this by analyzing logged execution data of process instances and generating a representation of a process model. Current work in the area of process mining and discovery such as require clean pre-processed, chronologically ordered and correlated process instance traces. The correlation specification in such examples is done by a user having an expert knowledge about the domain, the data sources and the involved applications. As such, the existing works do not efficiently address the issues surrounding this area.