The present invention relates to systems and methods for automatic data integration among multiple heterogeneous data sources. The present invention relates to Service-Oriented Architecture (SOA) and the Enterprise Information Integration (EII) domain combined with Business Process Management (BPM) tools. The present invention focuses on automating integration, and centralizing the “user view” using business level terms.
Data processing and publishing is still one of the most complicated issues information technology (IT) organizations face. Current integration solutions axe expensive, time consuming, and rigid. Such solutions are based on static and IT-dependent services (provided by IT developers using customized technology). Such solutions miss the huge potential of the metadata to automate the EII. Typically, knowledge of business processes is recorded only in technical papers by IT developers, while in few cases such knowledge is recorded electronically in metadata repositories for documentation and management purposes.
Organizations typically hire highly-skilled IT developers to provide customized services (following the SOA concept). Developers use available documentation on the data for developing such services. If available, data documentation includes location, format, relational parameters, quality, and priorities of the data. Such services are typically software programs, sensitive to technology changes and to client needs (e.g. required static schema content and format, and security). Such software programs typically map static data schema and orchestrations, and access specific IT sources. The drawbacks of such software implementations include the following.                (1) Scattered knowledge—knowledge of services and software components scattered among different systems and by multiple developers.        (2) Limited solutions—solutions depend on human decisions, lacking the full view of the available IT information and clients' future uses.        (3) Poor code reuse—huge amount of services and software components are not fully reused in many cases.        (4) Inflexibility—static services requiring a lot of maintenance and change management handling.        (5) IT-dependent solutions—changes in the IT layer typically require redeployment and new application programming interface (API) for the client;        (6) Ineffective product deployment—introducing services into market is time consuming and expensive (due to extended time to reach market).        
In the prior art, there are many platforms that deal with some of the aspects mentioned above. A prior-art example by Morgenstern, U.S. Pat. No. 5,970,490 (hereinafter referred to as Morgenstern '490) hereby incorporated by reference as if fully set forth herein, teaches an integration platform for heterogeneous databases. However, Morgenstern '490 teaches a generic method for mapping data, focusing on databases. Morgenstern '490 does not teach methods for web-service integration or on-line integration. Furthermore, Morgenstern '490 does not teach methods for automated flow generation and execution based on priority rules, data quality, data availability, and other criteria
A prior-art example by Amirisetty et al, U.S. Pat. No. 7,152,090 B2 (hereinafter referred to as Amirisetty '090) hereby incorporated by reference as if fully set forth herein, teaches a metadata-aware enterprise application integration framework for application server environments. However, Amirisetty '090 teaches tools for connectors and adapter generation, not on-line and dynamic integration (both important features in advancing the art of business integration methods). Amirisetty '090 primarily teaches java-platform tools in which users make high-level function calls. Metadata is used to describe the high-level and low-level function calls. The approach is not data-oriented integration.
A prior-art example by Michaelides, International Patent No. WO 2004/082179 A2 (hereinafter referred to as Michaelides '179) hereby incorporated by reference as if fully set forth herein, teaches a generic software adapter. However, Michaelides '179 teaches templates for software adapter and stream mapping using metadata, not data integration.
A prior-art example by Stanley et al., U.S. Pat. No. 6,988,109 B2 (hereinafter referred to as Stanley '109) hereby incorporated by reference as if fully set forth herein, teaches methods for an intelligent object-based information-technology platform. However, Stanley '109 primarily teaches data-mining tools, using direct mapping of objects to data for search purposes.
A prior-art example by Ainsbury et al., U.S. Pat. No. 6,078,924 (hereinafter referred to as Ainsbury '924) hereby incorporated by reference as if fully set forth herein, teaches methods for performing data collection, interpretation, and analysis in an information platform. According to the method of Ainsbury '924, data is replicated, not integrated, using a kind of “on-line cache” approach.
A prior-art example by Statchuk, US Patent Publication No. 2007/0055691 (hereinafter referred to as Statchuk '691) hereby incorporated by reference as if fully set forth herein, teaches a method for managing exemplar terms database for business-oriented metadata content. However, Statchuk '691 primarily teaches methods for reporting and searching metadata, not data integration.
A prior-art example by Ghatate, U.S. Pat. No. 6,317,749 B1 (hereinafter referred to as Ghatate '749) hereby incorporated by reference as if fully set forth herein, teaches methods for providing relationship objects. However, Ghatate '749 primarily teaches a relationship model, not an integration model.
A prior-art example by Walsh et al., U.S. Pat. No. 6,810,429 B1 (hereinafter referred to as Walsh '429) hereby incorporated by reference as if fully set forth herein, teaches an enterprise integration system. However, Walsh '429 primarily teaches tools for converting data to and from XML format, not data-integration tools.
A prior-art example by Brumme et al., U.S. Pat. No. 6,134,559 (hereinafter referred to as Brumme '559) hereby incorporated by reference as if fully set forth herein, teaches methods for integrating objects defined by different foreign object-type systems into a single system. Brumme '559 primarily teaches an “object-oriented integration” using tags for uniform objects that connect metadata objects to data within a data source. However, Brumme '559 does not teach methods for automated flow generation and execution based on priority rules, data quality, data availability, and other criteria.
Current approaches to business integration still present a difficult, time-consuming, and expensive process to an organization. The most urgent issues in the integration field needing to be addressed can be summarized as including:                (1) accessing heterogeneous data sources;        (2) making services “dynamic”, as opposed to the current “static” aspect of services having fixed APIs and dependence on customized solutions, requiring full development and testing cycles for future API changes (i.e. poor flexibility);        (3) designating security at the “data level”, as opposed to at the current “service level” (as in SOA), requiring changes in the services (or adding new services) in order to change the security of the data;        (4) making the services and the IT layer less interdependent, as opposed to the current situation in which the services and the IT layer are strongly coupled, requiring changes in the services (and new development and testing cycles) in order to replace or change the IT layer (e.g. changes in the legacy system require changes in the integration area and typically in the client application as well; new technologies like web services have made vast change in the authorization level; for example, previously when accessing databases directly, one could set access privileges at the “table-columns level”; whereas, using web-services technology, such a security level has vanished);        (5) improving data-rate quality to periodically and frequently provide up-to-date data;        (6) reducing required data transformations between the different services participating in the data-service-execution solution flow; freeing developers from having to understand the specific data types and formats from different heterogeneous data sources in order to write develop appropriate transformation routines;        (7) simplifying the mapping of data between different services participating in different solution flows (i.e. mapping between the outputs of one service to the inputs of the next service in the flow);        (8) reducing time to market due to tedious and costly development and testing cycles associated with publishing services/products;        (9) streamlining the deployment process by: add new modules declaratively, modifying mappings without new software development and testing cycles and without requiring shutting down the server in order to replace the old services with the new modified services, and managing service versioning;        (10) simplifying data synchronization by freeing developers from having to continually be aware of the different data sources and services handling the same data items;        (11) automating business processes integration to overcome weaknesses in static integration tools that provide developers static APIs for integration, requiring a full development team to be involved in integration development;        (12) automating “failovers” that are currently handled by developers (by redirecting requests to alternative available services) due to failover behavior being statically defined (i.e. hard-coded) in customized services, reflecting the inflexibility of the failover algorithm (i.e. modifying failover behavior in any service requires a new development cycle);        (13) automating service auditing that is currently handled by developers in a customized and manual fashion (i.e. any required auditing modification requires a new development cycle); and        (14) automating service monitoring that is currently handled by developers in a customized and manual fashion (i.e. any required monitoring modification requires a new development cycle).        
It would be desirable to have systems and methods for automatic data integration among multiple heterogeneous data sources that treat the issues described above.