Most organizations have information management problems that require access to timely information for business monitoring, including the need to integrate many disparate applications in order to provide that information completely. Many of these applications come from different vendors who use their own nomenclature for application metadata, which makes the integration of data from the different applications very difficult.
For example, an insurance company may use different databases, with different applications purchased from different vendors, to provide underwriting, rate quotes, and motor vehicle information, each of which may have its own lexical and semantic conventions for handling data. One database may use the term “customer_number,” but another database may refer to the same data as “customer_no” and a third database as “customer_id,” making it hard to match this information efficiently. Similarly, one database may use “employee_number” and another database “employee_id” for the same data. Different conventions may apply not just to such lexical differences, but to semantic ones as well. For example, one database may define the term “employee” to include full-time employees as well as part-time employees and contractors, but another database may exclude contractors from the category “employee.” This can make it very difficult to monitor and evaluate the category “employee,” over multiple databases. One approach to normalizing transactions with semantic differences is to describe those transactions with ontologies, where the ontologies provide a description logic and a taxonomy. By applying a source ontology to a transaction, the transaction may be interpreted according to the semantics provided for by the ontology.
Two different methods have typically been used to solve this problem. One solution is called Enterprise Application Integration (EAI), which uses a real-time message monitor to obtain information about different databases through real-time messages from those databases. Transformation of data from one system to the next is also accomplished in real time. The advantage of this real-time transformation in terms of alerting and reporting is that new information can be acted on by the monitor immediately. Notifications and alerts can be triggered by the arrival of new messages. The disadvantage of this approach for alerting and notification purposes is that not all application functions result in a message being generated to another system. Often, these transactions are isolated to the application and its data store and are therefore invisible to the monitor.
Another solution, called Extraction, Transformation, and Loading (ETL), is focused on persistent data and is usually run in batch mode. These products are based on a pull model, meaning that they pull their data from the disparate databases without waiting for messages from those databases. Once this data is pulled, mapping programs are run on the data to normalize the metadata differences into a standardized, more useable form. The normalized data is then usually sent to a data mart or data warehouse, where it can be accessed for purposes of historical analytics or to create an off-line reporting system where large queries will not affect the response times of an active user's normal transactions such as create, update or delete.
However, one disadvantage of this process is that special programs must be written for the conversion of data from the different databases, which is time-consuming and expensive. Moreover, the traffic load of this process strains the resources of the databases when the data is being pulled from them. The ETL process, even though run at night, will create a significant degradation in transaction response times for end users of the databases. This is a major disadvantage since, with the advent of the Internet, databases typically must function at a high level twenty-four hours a day, every day. In addition, once this process has been put into place it can be disrupted or broken if one or more of the databases involved are updated. Another problem with this approach is that the data it provides is not available in real time, although real-time monitoring of databases can be crucial for a business.
Therefore, there is a need for an automated system and method to normalize persistent data sources for real-time monitoring without straining the resources of the data sources.